Analysis of social media language using AI models predicts depression severity for white Americans, but not Black Americans
Researchers were able to predict depression severity for white people, but not for Black people using standard language-based computer models to analyze Facebook posts.
Words and phrases associated with depression, such as first-person pronouns and negative emotion words, were around three times more predictive of depression severity for white people than for Black people. The study, published in the Proceedings of the National Academy of Sciences, is co-authored by researchers at the University of Pennsylvania, Philadelphia, and the National Institute on Drug Abuse (NIDA), part of the National Institutes of Health (NIH), which also funded the study.
The study, which recruited 868 consenting participants who identified themselves as Black or white, demonstrated that models trained on Facebook language used by white participants with self-reported depression showed strong predictive performance when tested on the white participants. However, when the same models were trained on Facebook language from Black participants, they performed poorly when tested on the Black participants, and showed only slightly better performance when tested on white participants.
While depression severity was associated with increased use of first-person singular pronouns (“I,” “me,” “my”) in white participants, this correlation was absent in Black participants. Additionally, white people used more language to describe feelings of belongingness (“weirdo,” “creep”), self-criticism (“mess,” “wreck”), being an anxious-outsider (“terrified,” “misunderstood”), self-deprecation (“worthless,” “crap”), and despair (“begging,” “hollow”) as depression severity increased, but there was no such correlation for Black people. For decades, clinicians have been aware of demographic differences in how people express depressive symptoms, and this study now demonstrates how this can play out in social media.
Language-based models hold promise as personalized, scalable, and affordable tools to screen for mental health disorders. For example, excessive self-referential language, such as the use of first-person pronouns, and negative emotions, such as self-deprecating language, are often regarded as clinical indicators of depression. However, there has been a notable absence of racial and ethnic consideration in assessing mental disorders through language, an exclusion that leads to inaccurate computer models. Despite evidence showing that demographic factors influence the language people use, previous studies have not systematically explored how race and ethnicity influence the relationship between depression and language expression.
Researchers set up this study to help bridge this gap. They analyzed past Facebook posts from Black and white people who self-reported depression severity through the Patient Health Questionnaire (PHQ-9) — a standard self-report tool used by clinicians to screen for possible depression. The participants consented to share their Facebook status updates. Participants were primarily female (76%) and ranged from 18 to 72 years old. The researchers matched Black and white participants on age and sex so that data from the two groups would be comparable.