ML Applications
Clustering and Latent Variable Models in Psychology
Latent profile and class analysis, item response theory, psychometric network models, and the deep generative revival of factor analysis — what each model assumes and when it misleads.
Why This Matters
Most psychological constructs are unobserved. A respondent does not have an "extraversion" gauge that the researcher reads off; instead, the researcher infers extraversion from a battery of items that correlate. Psychometrics is the discipline of building probabilistic models in which the construct is a latent variable and the responses are conditional on it.
The choice of latent variable model is not cosmetic. It encodes what the analyst believes the construct is: a continuous trait, a discrete subtype, a position on a graded ladder of severity, or a node in a causal network of symptoms. Different choices give different policy implications. A single-trait IRT model of depression argues for a continuous severity score; a network model argues that severing specific symptom-symptom links is the correct intervention.
Core Methods
Latent Class Analysis (LCA) and Latent Profile Analysis (LPA) assume a finite number of unobserved subgroups and estimate group-conditional response distributions. LCA handles categorical indicators, LPA handles continuous ones. The fitting routine is expectation-maximization with model selection by BIC, the bootstrap likelihood ratio test, or entropy. Muthen and Muthen's Mplus implementation is the standard in applied psychology. The risk is well known: LPA almost always finds classes whether or not classes exist. A multivariate Gaussian dataset with no true clusters will be fit by a 3-class LPA with low entropy and a "significant" BLRT. Convergent evidence from external validators is the only honest criterion for class existence.
Item Response Theory (IRT) models the probability of a correct or endorsed response as a logistic function of a person's latent trait and item parameters . The two-parameter logistic gives , where is the item discrimination and is the difficulty. IRT lets test developers compare items across populations, build computerized adaptive tests, and detect differential item functioning across demographic groups. Reise and Revicki's Handbook of Item Response Theory Modeling is the standard reference. IRT assumes unidimensionality and local independence; both fail in practice and have to be tested.
Psychometric network models, popularized by Borsboom (2017), drop the latent variable entirely. Symptoms are nodes; partial correlations or Ising couplings are edges. The clinical claim is that mental disorders are not entities with a hidden cause but self-sustaining patterns of symptom-symptom activation. The estimation tools are the graphical lasso for continuous data and Ising estimation for binary symptom data, both with cross-validated regularization. The empirical case is contested: replicability of estimated network structure is poor at typical psychology sample sizes, and the generative interpretation of partial correlations as causal couplings is rarely justified.
Bayesian network construction layers a directed acyclic graph on top, restoring causal language at the cost of strong identifiability assumptions. PC, GES, and FCI search algorithms recover Markov equivalence classes from data; without interventions, the orientation of many edges remains undetermined. See Bayesian estimation and causal-inference foundations for the assumption set.
The newest direction is generative latent variable models with deep nonlinear decoders — variational autoencoders and their cousins — applied to high-dimensional psychometric or behavioral data. The latent space is continuous and unconstrained, which loses the interpretability of factor loadings, but allows nonlinear item-trait relationships and gracefully handles missing data. The honest framing is "factor analysis with a flexible likelihood," not "factor analysis solved."
Finding K classes does not mean K classes exist
LPA fits a finite mixture model. A finite mixture is a flexible density approximator: it will recover some mixture even when the data come from a single skewed unimodal distribution. Bauer and Curran (2003) showed this explicitly. BIC and entropy do not test class existence; they only compare models within the mixture family. Substantive validation against external criteria (treatment response, biomarkers, longitudinal trajectories) is required before claiming the classes are real.
Network models and latent variable models are not rival theories of the same data
The literature often presents network analysis as a refutation of the latent variable view. They are statistically near-equivalent on cross-sectional data: any low-rank latent factor model induces a particular pattern of marginal partial correlations that a network estimator will reproduce. Distinguishing them empirically requires either intervention or longitudinal data with sufficient density. Borsboom 2017 makes the conceptual case; the empirical discrimination is harder than usually claimed.
References
Borsboom 2017
Borsboom (2017). "A network theory of mental disorders." World Psychiatry 16(1):5-13. The conceptual case for the network view and its policy implications.
Reise and Revicki 2014
Reise and Revicki, eds. (2014). Handbook of Item Response Theory Modeling: Applications to Typical Performance Assessment. Routledge. Standard reference for applied IRT in personality and clinical assessment.
Mplus user guide
Muthen and Muthen (current edition). Mplus User's Guide. The applied reference for LCA, LPA, mixture models, and growth mixture models in psychology and education.
Bauer Curran 2003
Bauer and Curran (2003). "Distributional assumptions of growth mixture models: implications for overextraction of latent trajectory classes." Psychological Methods 8(3):338-363. The canonical warning that mixtures over-extract under skew.
Embretson Reise 2000
Embretson and Reise (2000). Item Response Theory for Psychologists. Lawrence Erlbaum. Accessible introduction to 1PL, 2PL, 3PL, GRM, and DIF for an applied audience.
Cole Maxwell 2003
Cole and Maxwell (2003). "Testing mediational models with longitudinal data: questions and tips in the use of structural equation modeling." Journal of Abnormal Psychology 112(4):558-577. The reference cited every time someone fits a cross-sectional mediation model and calls it causal.
Related Topics
Last reviewed: April 18, 2026