Skip to main content

ML Applications

NLP for Psychology Text Data

How psychologists use dictionary methods, transformer encoders, and embedding-based classifiers on language data to study personality, mood, and mental health, and where these pipelines fail.

AdvancedTier 3Current~14 min
0

Why This Matters

Psychology has always been a measurement-bound science. Self-report scales are noisy and reactive; clinician ratings are scarce and expensive. Naturally occurring text (diary entries, social media posts, therapy transcripts, electronic health record notes) gives a high-frequency behavioral signal that does not require a study visit. NLP turns that signal into variables a psychologist can model.

The applications split into two camps. One camp uses text to measure a construct (personality, affect, suicidal ideation) so it can be entered into a downstream regression. The other uses text to predict an outcome (a future depression diagnosis, a treatment response) without claiming the model captures a latent trait. The methods overlap; the validity arguments do not.

Core Methods

The oldest tool is the dictionary count. Linguistic Inquiry and Word Count (LIWC; Pennebaker, Boyd, Jordan, Blackburn 2015) maps words to roughly 90 categories — first-person singular pronouns, negative emotion, cognitive process words, social references — and reports each as a percentage of total tokens. Dictionary methods are transparent, fast, and replicable across corpora, but they ignore syntax, negation, and context. "I am not sad" and "I am sad" produce identical negative-emotion scores. They remain the default for hypothesis-driven psychology because the units are interpretable and the scoring code does not change between studies.

The second wave is supervised classification on top of pretrained encoders. A BERT-style model fine-tuned on labeled posts (PHQ-9 scores, clinician annotations, self-disclosed diagnoses) outperforms LIWC on prediction metrics like AUC and F1, often by 10 to 20 points. Eichstaedt et al. 2018 (PNAS) trained a logistic model on Facebook language and predicted depression diagnoses recorded in linked EHR data with an AUC near 0.72, outperforming demographics-only baselines. The features that drove the model — first-person singular pronouns, hostility, loneliness words, references to medical care — were consistent with prior small-sample findings in clinical psychology, which raised the credibility of the result.

The third wave uses dense word embeddings and contextual representations as input to downstream models. An EHR clinical note can be encoded with a domain-tuned BERT variant, pooled, and fed into a risk model. Sentiment analysis on therapist-patient dialogue uses the same machinery. The cost is interpretability: a high coefficient on dimension 137 of a 768-dim hidden state does not translate into a clinical narrative.

A fourth, unsettled use is generative chat. Tools that present themselves as therapy assistants are not regulated medical devices in most jurisdictions, do not retain liability for harm, and have no validated escalation pathway for suicidal users. Mental health professional bodies have flagged risk; serious empirical evaluation on safety-critical interactions is sparse.

Watch Out

Predicting a diagnosis is not measuring depression

Eichstaedt et al. 2018 predicted whether a person had a depression code in their EHR. That is not the same as measuring whether the person was depressed. EHR codes encode who showed up, who got billed, and who had insurance. A model that uses Facebook language to predict EHR depression codes inherits all of those selection effects. It is an honest predictor of a label, not a thermometer for mood.

Watch Out

LIWC validity is category-specific, not blanket

LIWC's negative-emotion category has been validated in many corpora; its cognitive-process and analytic-thinking categories have weaker, more contested validity. Citing "LIWC has been validated" without naming the category is a common error. Pennebaker's manual lists the per-category validation evidence; agents should cite the specific category they used.

References

LIWC 2015

Pennebaker, Boyd, Jordan, Blackburn (2015). The Development and Psychometric Properties of LIWC2015. University of Texas at Austin technical report. The category list, scoring procedure, and per-category reliability evidence.

Eichstaedt 2018 PNAS

Eichstaedt, Smith, Merchant, Ungar et al. (2018). "Facebook language predicts depression in medical records." PNAS 115(44):11203-11208. The EHR-linked Facebook study; prediction AUC, feature importances, sample construction.

Schwartz 2013

Schwartz, Eichstaedt, Kern et al. (2013). "Personality, gender, and age in the language of social media: the open-vocabulary approach." PLOS ONE 8(9):e73791. Open-vocabulary methods; comparison with closed-vocabulary LIWC.

Coppersmith CLPsych

Coppersmith, Dredze, Harman (2014). "Quantifying mental health signals in Twitter." Proceedings of the Workshop on Computational Linguistics and Clinical Psychology. The CLPsych shared-task series and its dataset construction caveats.

De Choudhury 2013

De Choudhury, Counts, Horvitz (2013). "Predicting postpartum changes in emotion and behavior via social media." CHI 2013. Early longitudinal mood-prediction work using social media language.

Tausczik Pennebaker 2010

Tausczik, Pennebaker (2010). "The psychological meaning of words: LIWC and computerized text analysis methods." Journal of Language and Social Psychology 29(1):24-54. The canonical methods overview for psychological LIWC use.

Related Topics

Last reviewed: April 18, 2026