Skip to main content

Statistical Estimation

REML and Variance Component Estimation

Why restricted maximum likelihood estimates variance components from error contrasts rather than the full data likelihood, and why that usually behaves better than ML when fixed effects are present.

AdvancedTier 2Stable~45 min
0

Why This Matters

Variance components appear whenever a model separates fixed structure from random heterogeneity: random-intercept models, random-effects meta-analysis, and small area estimation all fit this pattern. The object of interest is often a variance parameter such as a between-area variance AA or a between-study variance τ2\tau^2.

Plain maximum likelihood estimates those variances from the full data likelihood. That sounds natural, but it systematically ignores one fact: some degrees of freedom were already spent estimating the fixed effects. In small samples, ML therefore tends to push variance components downward, sometimes all the way to zero.

Restricted maximum likelihood, usually shortened to REML, corrects that idea at the source. It builds a likelihood only from residual directions that carry no information about the fixed effects. That is why REML is the default variance-component estimator in much of mixed-model practice.

Mental Model

Suppose the data vector yy lives in an nn-dimensional space and the fixed effects span a pp-dimensional subspace through the design matrix XX.

  • The pp directions along the columns of XX are used to estimate β\beta.
  • The remaining npn-p directions are residual contrasts.

ML uses all nn directions when estimating the variance parameters. REML uses only the npn-p residual directions. That is the whole idea.

Formal Setup

Definition

Linear Mixed Model

A linear mixed model can be written as

y=Xβ+Zu+ε,y = X \beta + Z u + \varepsilon,

with

uN(0,Gθ),εN(0,Rθ),u \sim N(0, G_\theta), \qquad \varepsilon \sim N(0, R_\theta),

independent of each other. The marginal covariance of yy is

Vθ=ZGθZ+Rθ.V_\theta = Z G_\theta Z^\top + R_\theta.

The parameter vector θ\theta collects the unknown variance components.

Definition

Variance Component

A variance component is any parameter inside GθG_\theta or RθR_\theta that controls random-effect or error variability. Examples include a random-intercept variance, a between-study heterogeneity parameter, or the area-level variance AA in a Fay-Herriot model.

Definition

Error Contrast

An error contrast is a linear transformation KyK^\top y such that KX=0K^\top X = 0 and rank(K)=np\operatorname{rank}(K) = n-p. These contrasts remove the fixed-effect contribution and retain only the directions relevant for estimating variance parameters.

Main Theorem

Theorem

REML as Likelihood of Error Contrasts

Statement

Let KK be any full-rank error-contrast matrix. Then the restricted likelihood for the variance parameters θ\theta is the likelihood of the transformed data KyK^\top y:

LR(θ)KVθK1/2exp ⁣(12yPθy),L_R(\theta) \propto |K^\top V_\theta K|^{-1/2} \exp\!\left(-\frac{1}{2} y^\top P_\theta y\right),

where

Pθ=Vθ1Vθ1X(XVθ1X)1XVθ1.P_\theta = V_\theta^{-1} - V_\theta^{-1} X \left(X^\top V_\theta^{-1} X\right)^{-1} X^\top V_\theta^{-1}.

Equivalently, up to an additive constant, the restricted log-likelihood is

R(θ)=12[logVθ+logXVθ1X+yPθy].\ell_R(\theta) = -\frac{1}{2}\left[\log |V_\theta| + \log |X^\top V_\theta^{-1} X| + y^\top P_\theta y\right].

This likelihood depends on θ\theta but not on the unknown fixed-effects vector β\beta.

Intuition

REML estimates variance parameters only from the part of the data that remains after projecting away the fixed-effect directions. That is why the fixed effects do not appear in the restricted likelihood.

Proof Sketch

Choose a matrix KK whose columns span the orthogonal complement of the column space of XX. The transformed vector KyK^\top y removes the mean term XβX \beta and has covariance KVθKK^\top V_\theta K. Its Gaussian likelihood therefore depends only on θ\theta. Algebraic manipulation of that likelihood gives the equivalent determinant-plus-quadratic form involving PθP_\theta.

Why It Matters

This is the clean justification for REML. It is not an ad hoc correction term added to ML after the fact. It is a different likelihood, built from the correct part of the data for variance estimation when fixed effects are present.

Failure Mode

REML does not solve every variance-component problem. Boundary estimates at zero can still occur. Restricted likelihood values are not comparable across models with different fixed-effect design matrices XX, so REML is the wrong tool for likelihood-ratio comparisons that change the fixed effects.

ML vs REML

QuestionMLREML
What likelihood is maximized?Full data likelihoodLikelihood of error contrasts
What happens to fixed-effect degrees of freedom?Ignored inside variance estimationAccounted for explicitly
Small-sample bias in variance componentsMore downward biasUsually less downward bias
Can you compare different fixed-effect structures by likelihood ratio?YesNo
Can the estimate still hit zero?YesYes

The main practical point is narrow: REML is usually better for estimating variance components, not for every model-comparison question.

Canonical Example

Example

Random intercept with few groups

Suppose eight schools are modeled with a fixed treatment effect and a random school intercept. The random-intercept variance measures how much schools vary after accounting for treatment. With only eight groups, the full ML likelihood often pushes that variance downward because it treats the fitted treatment effect as if it were known in advance. REML removes the treatment-effect directions before estimating the school variance, so the estimate is typically less biased.

This is the same structural reason REML is common in small area estimation and in random-effects meta-analysis: the variance parameter is supposed to capture leftover heterogeneity, not variation already absorbed by fixed effects.

Where This Shows Up

  • In small area estimation, REML is a common estimator of the area-level variance AA in Fay-Herriot and related mixed models.
  • In random-effects meta-analysis, REML is often used to estimate the between-study variance τ2\tau^2.
  • In longitudinal and multilevel models, REML is the default choice in many software packages when the fixed-effect structure is already decided.

Common Confusions

Watch Out

REML is not a Bayesian method

REML is still a likelihood-based frequentist estimator. It can look Bayesian because it often behaves better in small samples, but the mechanism is purely likelihood-based: remove fixed-effect directions, then maximize the remaining likelihood.

Watch Out

REML is not for selecting fixed effects

AIC, likelihood-ratio tests, and nested-model comparisons that change the fixed-effect structure should be based on ML, not REML. Once the fixed effects are chosen, REML is a strong default for estimating the variance components.

Watch Out

REML does not make boundary problems disappear

If the true variance component is near zero or the data are very thin, REML can still land on the boundary. That is one reason specialized methods such as adjusted density maximization appear in the small-area literature.

Summary

  • REML estimates variance components from error contrasts, not the full data
  • This accounts for fixed-effect degrees of freedom that ML ignores
  • REML usually reduces downward bias in variance-component estimates
  • REML is a poor choice for comparing models with different fixed effects
  • Boundary estimates can still happen, especially when the true variance is small

Exercises

ExerciseCore

Problem

Why can two models with different fixed-effect design matrices be compared with ML likelihoods but not directly with REML likelihoods?

ExerciseAdvanced

Problem

In a mixed model with one variance component, the ML estimate is exactly zero while the REML estimate is small but positive. Give a plausible structural reason for this difference without doing any algebra.

References

Canonical:

  • Patterson and Thompson, "Recovery of Inter-Block Information When Block Sizes Are Unequal" (1971), Biometrika 58(3), 545-554. Original restricted-likelihood construction.
  • Harville, "Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems" (1977), JASA 72(358), 320-338. Classic REML review and derivation.
  • Searle, Casella, McCulloch, Variance Components (1992), Chapters 6-8. Standard textbook treatment of ML and REML.
  • Jiang, Linear and Generalized Linear Mixed Models and Their Applications (2007), Chapters 1-3. Mixed-model estimation framework with REML as the default variance-component tool.
  • Rao and Molina, Small Area Estimation, 2nd ed. (2015), Chapters 5 and 7. REML in the Fay-Herriot and related SAE models.

Current / practice:

  • Bates, Maechler, Bolker, Walker, "Fitting Linear Mixed-Effects Models Using lme4" (2015), Journal of Statistical Software 67(1). Practical mixed-model fitting with REML defaults.
  • Cochrane Handbook for Systematic Reviews of Interventions, current Chapter 10. REML as a default heterogeneity estimator in random-effects meta-analysis.

Next Topics

Last reviewed: April 18, 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics