REML and Variance Component Estimation

Sneiderman, Robby

Statistical Estimation

REML and Variance Component Estimation

Why restricted maximum likelihood estimates variance components from error contrasts rather than the full data likelihood, and why that usually behaves better than ML when fixed effects are present.

AdvancedTier 2StableSupporting~45 min

Prerequisites

Maximum Likelihood Estimation Linear Regression Expectation Variance Covariance Moments

Prereq Map

Learning position

Read this page in the graph.

statistical-estimation | layer 2 | tier 2. This page has 3 direct prerequisites and 4 published dependents.

Open Atlas Prerequisites Leads to

What next

Small Area Estimation

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Variance components appear whenever a model separates fixed structure from random heterogeneity: random-intercept models, random-effects meta-analysis, and small area estimation all fit this pattern. The object of interest is often a variance parameter such as a between-area variance $A$ or a between-study variance $\tau^2$ .

Plain maximum likelihood estimates those variances from the full data likelihood. That sounds natural, but it systematically ignores one fact: some degrees of freedom were already spent estimating the fixed effects. In small samples, ML therefore tends to push variance components downward, sometimes all the way to zero.

Restricted maximum likelihood, usually shortened to REML, corrects that idea at the source. It builds a likelihood only from residual directions that carry no information about the fixed effects. That is why REML is the default variance-component estimator in much of mixed-model practice.

Mental Model

Suppose the data vector $y$ lives in an $n$ -dimensional space and the fixed effects span a $p$ -dimensional subspace through the design matrix $X$ .

The $p$ directions along the columns of $X$ are used to estimate $\beta$ .
The remaining $n-p$ directions are residual contrasts.

ML uses all $n$ directions when estimating the variance parameters. REML uses only the $n-p$ residual directions. That is the whole idea.

Formal Setup

Definition

Linear Mixed Model

A linear mixed model can be written as

$y = X \beta + Z u + \varepsilon,$

with

$u \sim N(0, G_\theta), \qquad \varepsilon \sim N(0, R_\theta),$

independent of each other. The marginal covariance of $y$ is

$V_\theta = Z G_\theta Z^\top + R_\theta.$

The parameter vector $\theta$ collects the unknown variance components.

Definition

Variance Component

A variance component is any parameter inside $G_\theta$ or $R_\theta$ that controls random-effect or error variability. Examples include a random-intercept variance, a between-study heterogeneity parameter, or the area-level variance $A$ in a Fay-Herriot model.

Definition

Error Contrast

An error contrast is a linear transformation $K^\top y$ such that $K^\top X = 0$ and $\operatorname{rank}(K) = n-p$ . These contrasts remove the fixed-effect contribution and retain only the directions relevant for estimating variance parameters.

Main Theorem

Theorem

REML as Likelihood of Error Contrasts

Statement

Let $K$ be any full-rank error-contrast matrix. Then the restricted likelihood for the variance parameters $\theta$ is the likelihood of the transformed data $K^\top y$ :

$L_R(\theta) \propto |K^\top V_\theta K|^{-1/2} \exp\!\left(-\frac{1}{2} y^\top P_\theta y\right),$

where

$P_\theta = V_\theta^{-1} - V_\theta^{-1} X \left(X^\top V_\theta^{-1} X\right)^{-1} X^\top V_\theta^{-1}.$

Equivalently, up to an additive constant, the restricted log-likelihood is

$\ell_R(\theta) = -\frac{1}{2}\left[\log |V_\theta| + \log |X^\top V_\theta^{-1} X| + y^\top P_\theta y\right].$

This likelihood depends on $\theta$ but not on the unknown fixed-effects vector $\beta$ .

Intuition

REML estimates variance parameters only from the part of the data that remains after projecting away the fixed-effect directions. That is why the fixed effects do not appear in the restricted likelihood.

Proof Sketch

Choose a matrix $K$ whose columns span the orthogonal complement of the column space of $X$ . The transformed vector $K^\top y$ removes the mean term $X \beta$ and has covariance $K^\top V_\theta K$ . Its Gaussian likelihood therefore depends only on $\theta$ . Algebraic manipulation of that likelihood gives the equivalent determinant-plus-quadratic form involving $P_\theta$ .

Why It Matters

This is the clean justification for REML. It is not an ad hoc correction term added to ML after the fact. It is a different likelihood, built from the correct part of the data for variance estimation when fixed effects are present.

Failure Mode

REML does not solve every variance-component problem. Boundary estimates at zero can still occur. Restricted likelihood values are not comparable across models with different fixed-effect design matrices $X$ , so REML is the wrong tool for likelihood-ratio comparisons that change the fixed effects.

report a correction →

ML vs REML

Question	ML	REML
What likelihood is maximized?	Full data likelihood	Likelihood of error contrasts
What happens to fixed-effect degrees of freedom?	Ignored inside variance estimation	Accounted for explicitly
Small-sample bias in variance components	More downward bias	Usually less downward bias
Can you compare different fixed-effect structures by likelihood ratio?	Yes	No
Can the estimate still hit zero?	Yes	Yes

The main practical point is narrow: REML is usually better for estimating variance components, not for every model-comparison question.

Canonical Example

Example

Random intercept with few groups

Suppose eight schools are modeled with a fixed treatment effect and a random school intercept. The random-intercept variance measures how much schools vary after accounting for treatment. With only eight groups, the full ML likelihood often pushes that variance downward because it treats the fitted treatment effect as if it were known in advance. REML removes the treatment-effect directions before estimating the school variance, so the estimate is typically less biased.

This is the same structural reason REML is common in small area estimation and in random-effects meta-analysis: the variance parameter is supposed to capture leftover heterogeneity, not variation already absorbed by fixed effects.

Where This Shows Up

In small area estimation, REML is a common estimator of the area-level variance $A$ in Fay-Herriot and related mixed models.
In random-effects meta-analysis, REML is often used to estimate the between-study variance $\tau^2$ .
In longitudinal and multilevel models, REML is the default choice in many software packages when the fixed-effect structure is already decided.

Common Confusions

Watch Out

REML is not a Bayesian method

REML is still a likelihood-based frequentist estimator. It can look Bayesian because it often behaves better in small samples, but the mechanism is purely likelihood-based: remove fixed-effect directions, then maximize the remaining likelihood.

Watch Out

REML is not for selecting fixed effects

AIC, likelihood-ratio tests, and nested-model comparisons that change the fixed-effect structure should be based on ML, not REML. Once the fixed effects are chosen, REML is a strong default for estimating the variance components.

Watch Out

REML does not make boundary problems disappear

If the true variance component is near zero or the data are very thin, REML can still land on the boundary. That is one reason specialized methods such as adjusted density maximization appear in the small-area literature.

Summary

REML estimates variance components from error contrasts, not the full data
This accounts for fixed-effect degrees of freedom that ML ignores
REML usually reduces downward bias in variance-component estimates
REML is a poor choice for comparing models with different fixed effects
Boundary estimates can still happen, especially when the true variance is small

Exercises

ExerciseCore

Problem

Why can two models with different fixed-effect design matrices be compared with ML likelihoods but not directly with REML likelihoods?

ExerciseAdvanced

Problem

In a mixed model with one variance component, the ML estimate is exactly zero while the REML estimate is small but positive. Give a plausible structural reason for this difference without doing any algebra.

References

Canonical:

Henderson, "Estimation of Variance and Covariance Components" (1953), Biometrics 9(2), 226-252. Foundational variance-component framework that anticipates REML.
Henderson, Kempthorne, Searle, von Krosigk, "The Estimation of Environmental and Genetic Trends from Records Subject to Culling" (1959), Biometrics 15(2), 192-218. Source of the mixed model equations (MMEs) underlying BLUP / EBLUP / REML.
Henderson, "Selection Index and Expected Genetic Advance" (1963). Further development of the mixed model equations.
Patterson and Thompson, "Recovery of Inter-Block Information When Block Sizes Are Unequal" (1971), Biometrika 58(3), 545-554. Original restricted-likelihood construction.
Harville, "Bayesian Inference for Variance Components Using Only Error Contrasts" (1974), Biometrika 61(2), 383-385. Bayesian interpretation of REML as the marginal likelihood under a flat prior on the fixed effects $\beta$ .
Harville, "Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems" (1977), JASA 72(358), 320-338. Classic REML review and derivation.
Searle, Casella, McCulloch, Variance Components (1992), Chapters 6-8. Standard textbook treatment of ML and REML.
Jiang, Linear and Generalized Linear Mixed Models and Their Applications (2007), Chapters 1-3. Mixed-model estimation framework with REML as the default variance-component tool.
Rao and Molina, Small Area Estimation, 2nd ed. (2015), Chapters 5 and 7. REML in the Fay-Herriot and related SAE models.

Current / practice:

Bates, Maechler, Bolker, Walker, "Fitting Linear Mixed-Effects Models Using lme4" (2015), Journal of Statistical Software 67(1). Practical mixed-model fitting with REML defaults.
Cochrane Handbook for Systematic Reviews of Interventions, current Chapter 10. REML as a default heterogeneity estimator in random-effects meta-analysis.

Next Topics

Small area estimation: where REML estimates the area-level variance in Fay-Herriot models
Prasad-Rao MSE correction: what changes once the variance component is estimated rather than known
Adjusted density maximization: a boundary-aware alternative when variance estimates near zero are the real problem

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Expectation, Variance, Covariance, and Momentslayer 0A · tier 1
Maximum Likelihood Estimation: Theory, Information Identity, and Asymptotic Efficiencylayer 0B · tier 1
Linear Regressionlayer 1 · tier 1

Derived topics

4

Meta-Analysislayer 2 · tier 2
Prasad-Rao MSE Correctionlayer 4 · tier 2
Small Area Estimationlayer 3 · tier 3
Adjusted Density Maximizationlayer 4 · tier 3

Graph-backed continuations

Small Area Estimation Meta-Analysis Prasad-Rao MSE Correction Adjusted Density Maximization