Using BRFSS Data to Predict Depression

by Jessica Wang


According to the National Alliance on Mental Illness, approximately 1 in 5 adults experiences a mental illness in a given year. Of those adults, approximately 50% are not receiving treatment. Why is it that people don’t ask for help, even if they are at risk for depression? I have trained a neural network on health survey questions to classify whether a person has been diagnosed with depression or not. This allows me to create a tool that asks simple survey questions and, based on previous health data, gives the likelihood of being diagnosed with depression.

Data cleaning:

I downloaded data as a text file from CDC’s Behavioral Risk Factor Surveillance System (BRFSS), a health-related telephone survey that collects data about U.S. residents regarding their health-related risks. The database consists of 60346 data entries, each representing one person’s answers to 2033 questions.

For my output data, I took the answers to the question of “Have you ever been told that you have a depressive disorder, including depression, major depression, dysthymia, or minor depression? The person was given the option of four responses: yes, no, don’t know/not sure, and refused. In the BRFSS data, a digit represents each response, respectively: 1, 2, 7, and 9.

In my process of cleaning the data, I had to eliminate several of the data entry points that included non-digit text or periods, both of which were not correlated to any meaningful data in the BRFFS code book found here:

Network selection:

I selected a simple feedforward multilayer perceptron (MLP) neural network using tensorflow that was originally designed for Iris data, a database provided by sklearn.

I also looked into a neural network that implements pruning, a technique that removes sections of the tree that contribute little to the classification of data. Pruning would be particularly useful in my program because my dataset has of 2033 questions, and having each person fill out a 2033 question long survey isn’t practical. Rather, it would be ideal to reduce the amount of questions to only the neurons that have weight on the synapses. However, because I was unable to find a pruning neural network that was compatible with my data, I couldn’t implement pruning in my project.

Application extension:

With the assistance of Charlotte Moremon, Rufus Taylor, and Kiana, I developed an application that asks a set of questions. The answers to the questions are run through the neural network which then classifies the data as belonging to someone who is likely to be depressed or not. Ideally, the neural network would produce a percentage that indicates probability of being depressed. Additionally, the application would then provide the information of local health care providers.