Statistical Estimation
REML and Variance Component Estimation
Why restricted maximum likelihood estimates variance components from error contrasts rather than the full data likelihood, and why that usually behaves better than ML when fixed effects are present.
Why This Matters
Variance components appear whenever a model separates fixed structure from random heterogeneity: random-intercept models, random-effects meta-analysis, and small area estimation all fit this pattern. The object of interest is often a variance parameter such as a between-area variance or a between-study variance .
Plain maximum likelihood estimates those variances from the full data likelihood. That sounds natural, but it systematically ignores one fact: some degrees of freedom were already spent estimating the fixed effects. In small samples, ML therefore tends to push variance components downward, sometimes all the way to zero.
Restricted maximum likelihood, usually shortened to REML, corrects that idea at the source. It builds a likelihood only from residual directions that carry no information about the fixed effects. That is why REML is the default variance-component estimator in much of mixed-model practice.
Mental Model
Suppose the data vector lives in an -dimensional space and the fixed effects span a -dimensional subspace through the design matrix .
- The directions along the columns of are used to estimate .
- The remaining directions are residual contrasts.
ML uses all directions when estimating the variance parameters. REML uses only the residual directions. That is the whole idea.
Formal Setup
Linear Mixed Model
A linear mixed model can be written as
with
independent of each other. The marginal covariance of is
The parameter vector collects the unknown variance components.
Variance Component
A variance component is any parameter inside or that controls random-effect or error variability. Examples include a random-intercept variance, a between-study heterogeneity parameter, or the area-level variance in a Fay-Herriot model.
Error Contrast
An error contrast is a linear transformation such that and . These contrasts remove the fixed-effect contribution and retain only the directions relevant for estimating variance parameters.
Main Theorem
REML as Likelihood of Error Contrasts
Statement
Let be any full-rank error-contrast matrix. Then the restricted likelihood for the variance parameters is the likelihood of the transformed data :
where
Equivalently, up to an additive constant, the restricted log-likelihood is
This likelihood depends on but not on the unknown fixed-effects vector .
Intuition
REML estimates variance parameters only from the part of the data that remains after projecting away the fixed-effect directions. That is why the fixed effects do not appear in the restricted likelihood.
Proof Sketch
Choose a matrix whose columns span the orthogonal complement of the column space of . The transformed vector removes the mean term and has covariance . Its Gaussian likelihood therefore depends only on . Algebraic manipulation of that likelihood gives the equivalent determinant-plus-quadratic form involving .
Why It Matters
This is the clean justification for REML. It is not an ad hoc correction term added to ML after the fact. It is a different likelihood, built from the correct part of the data for variance estimation when fixed effects are present.
Failure Mode
REML does not solve every variance-component problem. Boundary estimates at zero can still occur. Restricted likelihood values are not comparable across models with different fixed-effect design matrices , so REML is the wrong tool for likelihood-ratio comparisons that change the fixed effects.
ML vs REML
| Question | ML | REML |
|---|---|---|
| What likelihood is maximized? | Full data likelihood | Likelihood of error contrasts |
| What happens to fixed-effect degrees of freedom? | Ignored inside variance estimation | Accounted for explicitly |
| Small-sample bias in variance components | More downward bias | Usually less downward bias |
| Can you compare different fixed-effect structures by likelihood ratio? | Yes | No |
| Can the estimate still hit zero? | Yes | Yes |
The main practical point is narrow: REML is usually better for estimating variance components, not for every model-comparison question.
Canonical Example
Random intercept with few groups
Suppose eight schools are modeled with a fixed treatment effect and a random school intercept. The random-intercept variance measures how much schools vary after accounting for treatment. With only eight groups, the full ML likelihood often pushes that variance downward because it treats the fitted treatment effect as if it were known in advance. REML removes the treatment-effect directions before estimating the school variance, so the estimate is typically less biased.
This is the same structural reason REML is common in small area estimation and in random-effects meta-analysis: the variance parameter is supposed to capture leftover heterogeneity, not variation already absorbed by fixed effects.
Where This Shows Up
- In small area estimation, REML is a common estimator of the area-level variance in Fay-Herriot and related mixed models.
- In random-effects meta-analysis, REML is often used to estimate the between-study variance .
- In longitudinal and multilevel models, REML is the default choice in many software packages when the fixed-effect structure is already decided.
Common Confusions
REML is not a Bayesian method
REML is still a likelihood-based frequentist estimator. It can look Bayesian because it often behaves better in small samples, but the mechanism is purely likelihood-based: remove fixed-effect directions, then maximize the remaining likelihood.
REML is not for selecting fixed effects
AIC, likelihood-ratio tests, and nested-model comparisons that change the fixed-effect structure should be based on ML, not REML. Once the fixed effects are chosen, REML is a strong default for estimating the variance components.
REML does not make boundary problems disappear
If the true variance component is near zero or the data are very thin, REML can still land on the boundary. That is one reason specialized methods such as adjusted density maximization appear in the small-area literature.
Summary
- REML estimates variance components from error contrasts, not the full data
- This accounts for fixed-effect degrees of freedom that ML ignores
- REML usually reduces downward bias in variance-component estimates
- REML is a poor choice for comparing models with different fixed effects
- Boundary estimates can still happen, especially when the true variance is small
Exercises
Problem
Why can two models with different fixed-effect design matrices be compared with ML likelihoods but not directly with REML likelihoods?
Problem
In a mixed model with one variance component, the ML estimate is exactly zero while the REML estimate is small but positive. Give a plausible structural reason for this difference without doing any algebra.
References
Canonical:
- Patterson and Thompson, "Recovery of Inter-Block Information When Block Sizes Are Unequal" (1971), Biometrika 58(3), 545-554. Original restricted-likelihood construction.
- Harville, "Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems" (1977), JASA 72(358), 320-338. Classic REML review and derivation.
- Searle, Casella, McCulloch, Variance Components (1992), Chapters 6-8. Standard textbook treatment of ML and REML.
- Jiang, Linear and Generalized Linear Mixed Models and Their Applications (2007), Chapters 1-3. Mixed-model estimation framework with REML as the default variance-component tool.
- Rao and Molina, Small Area Estimation, 2nd ed. (2015), Chapters 5 and 7. REML in the Fay-Herriot and related SAE models.
Current / practice:
- Bates, Maechler, Bolker, Walker, "Fitting Linear Mixed-Effects Models Using lme4" (2015), Journal of Statistical Software 67(1). Practical mixed-model fitting with REML defaults.
- Cochrane Handbook for Systematic Reviews of Interventions, current Chapter 10. REML as a default heterogeneity estimator in random-effects meta-analysis.
Next Topics
- Small area estimation: where REML estimates the area-level variance in Fay-Herriot models
- Prasad-Rao MSE correction: what changes once the variance component is estimated rather than known
- Adjusted density maximization: a boundary-aware alternative when variance estimates near zero are the real problem
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Maximum Likelihood EstimationLayer 0B
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Differentiation in RnLayer 0A
- Linear RegressionLayer 1
- Matrix Operations and PropertiesLayer 0A
- Expectation, Variance, Covariance, and MomentsLayer 0A