Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Statistical Foundations

Small Area Estimation

Methods for producing reliable estimates in domains where direct survey estimates have too few observations for useful precision, using Fay-Herriot and unit-level models that borrow strength across areas.

AdvancedTier 3Current~55 min
0

Why This Matters

National surveys are designed to produce reliable estimates at the national or regional level. But policymakers need estimates for every county, every congressional district, every demographic subgroup. The sample size in any single small area is often too small for a direct estimate to be useful.

Small area estimation (SAE) solves this by combining the direct survey estimate with a model-based prediction, "borrowing strength" from related areas. The U.S. Census Bureau uses SAE to produce poverty estimates for every county and school district. The Bureau of Labor Statistics uses it for local unemployment rates. Health agencies use it for disease prevalence by county.

For ML practitioners, SAE is a clean example of the bias-variance tradeoff in action: you accept a small amount of model-dependent bias in exchange for a large reduction in variance.

Mental Model

You have two sources of information about a small area's mean θi\theta_i:

  1. Direct estimate yiy_i: unbiased but high variance (small sample size in area ii)
  2. Model-based prediction xiTβx_i^T \beta: low variance but potentially biased (relies on the model being correct)

The optimal combination is a weighted average that puts more weight on the direct estimate when it is precise and more weight on the model when the direct estimate is noisy. This is exactly what the composite estimator does.

Formal Setup

The Fay-Herriot Model

Definition

Fay-Herriot Model

The Fay-Herriot model (1979) is an area-level model with two stages:

Sampling model (Level 1): The direct estimator yiy_i for area ii is:

yi=θi+ei,eiN(0,ψi)y_i = \theta_i + e_i, \quad e_i \sim N(0, \psi_i)

where θi\theta_i is the true area mean and ψi\psi_i is the known sampling variance of the direct estimator (estimated from the survey design).

Linking model (Level 2): The true area means follow a regression:

θi=xiTβ+vi,viN(0,A)\theta_i = x_i^T \beta + v_i, \quad v_i \sim N(0, A)

where xix_i are area-level covariates, β\beta is a vector of regression coefficients, and AA is the model variance (unknown, to be estimated).

Combining: yi=xiTβ+vi+eiy_i = x_i^T \beta + v_i + e_i, a mixed model with known heterogeneous error variances ψi\psi_i and unknown random effect variance AA.

Definition

Direct Estimator

The direct estimator yiy_i uses only the survey data from area ii. Under the survey design, it is (approximately) unbiased for θi\theta_i. Its variance ψi\psi_i is large when the sample size in area ii is small. For areas with zero sample, the direct estimator is undefined.

Definition

Synthetic Estimator

The synthetic estimator θ^isyn=xiTβ^\hat{\theta}_i^{\text{syn}} = x_i^T \hat{\beta} uses the regression model to predict the area mean from covariates. It has low variance (it uses data from all areas through β^\hat{\beta}) but is biased if the model is misspecified or if area ii deviates from the model.

Main Theorems

Theorem

Fay-Herriot Composite Estimator

Statement

Under the Fay-Herriot model, the best linear unbiased predictor (BLUP) of θi\theta_i is the composite estimator:

θ^iFH=γiyi+(1γi)xiTβ^\hat{\theta}_i^{\text{FH}} = \gamma_i y_i + (1 - \gamma_i) x_i^T \hat{\beta}

where the shrinkage factor is:

γi=AA+ψi\gamma_i = \frac{A}{A + \psi_i}

and β^\hat{\beta} is the GLS estimate of β\beta from the combined model. The mean squared error of this estimator is:

MSE(θ^iFH)=γiψi=AψiA+ψi\text{MSE}(\hat{\theta}_i^{\text{FH}}) = \gamma_i \psi_i = \frac{A \psi_i}{A + \psi_i}

which is always smaller than both ψi\psi_i (variance of the direct estimator) and AA (variance of the synthetic estimator).

Intuition

The composite estimator is a weighted average of the direct estimate and the regression prediction. The weight γi\gamma_i on the direct estimate depends on the signal-to-noise ratio. When the sampling variance ψi\psi_i is small relative to the model variance AA, the direct estimate is reliable and gets high weight (γi\gamma_i close to 1). When ψi\psi_i is large (small sample), the model prediction dominates (γi\gamma_i close to 0). This is shrinkage toward the regression line.

Proof Sketch

The Fay-Herriot model is a linear mixed model yi=xiTβ+vi+eiy_i = x_i^T\beta + v_i + e_i with Var(vi+ei)=A+ψi\text{Var}(v_i + e_i) = A + \psi_i. The BLUP of θi=xiTβ+vi\theta_i = x_i^T\beta + v_i is the conditional expectation E[θiyi]=xiTβ+AA+ψi(yixiTβ)=γiyi+(1γi)xiTβ\mathbb{E}[\theta_i \mid y_i] = x_i^T\beta + \frac{A}{A+\psi_i}(y_i - x_i^T\beta) = \gamma_i y_i + (1-\gamma_i)x_i^T\beta. This follows from the standard result for the conditional mean of a bivariate normal.

Why It Matters

This is the workhorse of small area estimation in practice. The Census Bureau's SAIPE (Small Area Income and Poverty Estimates) program uses Fay-Herriot models. The formula shows exactly how borrowing strength works: areas with less data are pulled more toward the model, while areas with more data retain their direct estimates.

Failure Mode

If the linking model is misspecified (wrong covariates, wrong functional form), the synthetic component xiTβ^x_i^T\hat{\beta} is biased, and this bias is inherited by the composite estimator. The MSE formula assumes AA is known; in practice, AA is estimated, which adds uncertainty not captured by the simple formula. The Prasad-Rao correction addresses this. If ψi\psi_i is poorly estimated (very small area sample sizes), the model can behave erratically.

Empirical Bayes and Hierarchical Bayes

Empirical Bayes (EB)

In the empirical Bayes approach, AA and β\beta are estimated from the data by maximum likelihood, REML, or method of moments. The EB estimator plugs these estimates into the BLUP formula. This is computationally simple but underestimates uncertainty because it treats the estimated AA as if it were the true value.

Hierarchical Bayes (HB)

The hierarchical Bayes approach places priors on β\beta and AA, then computes the full posterior distribution p(θiy,x)p(\theta_i \mid y, x) via MCMC. This properly accounts for uncertainty in all parameters. The posterior mean is similar to the EB point estimate but the posterior intervals are wider (and more honest) because they incorporate parameter uncertainty.

The HB approach is preferred when proper uncertainty quantification matters, such as when the estimates feed into policy decisions.

Unit-Level Models

The Fay-Herriot model works with area-level summaries. When unit-level data is available, the Battese-Harter-Fuller (BHF) model (1988) is the standard:

yij=xijTβ+vi+eijy_{ij} = x_{ij}^T \beta + v_i + e_{ij}

where yijy_{ij} is the outcome for unit jj in area ii, viN(0,A)v_i \sim N(0, A) is the area random effect, and eijN(0,σe2)e_{ij} \sim N(0, \sigma_e^2). The BLUP of the area mean θˉi\bar{\theta}_i uses both the sample and non-sample units in area ii.

Unit-level models are more efficient when individual covariates carry information, but they require access to the microdata and the population-level covariate distribution for each area.

Common Confusions

Watch Out

SAE is not just multilevel modeling

While SAE uses hierarchical models, the goals differ. In standard multilevel modeling, you care about the fixed effects β\beta or the variance components. In SAE, you care about predicting the area-specific random effects θi=xiTβ+vi\theta_i = x_i^T\beta + v_i. The MSE of the predictor, not the variance of the estimator of β\beta, is the primary quantity of interest.

Watch Out

The sampling variances psi_i are treated as known

This is a modeling convenience, not a literal truth. In practice, ψi\psi_i are estimated from the survey design. For areas with reasonable sample sizes, this is fine. For very small areas, the estimated ψi\psi_i may be unstable. Smoothing the sampling variances (e.g., via a generalized variance function) is common practice.

Watch Out

Borrowing strength is not free

The composite estimator reduces MSE on average across areas. But for any specific area ii, the model-based component can introduce bias if the model is wrong for that area. SAE trades unbiasedness for lower MSE. This is the same tradeoff as in ridge regression or James-Stein estimation.

Summary

  • Direct estimates have no bias but high variance in small areas
  • Synthetic (model-based) estimates have low variance but potential bias
  • The Fay-Herriot composite estimator optimally combines both
  • Shrinkage factor γi=A/(A+ψi)\gamma_i = A/(A+\psi_i) depends on relative noise levels
  • Empirical Bayes is simple but underestimates uncertainty
  • Hierarchical Bayes properly accounts for parameter uncertainty
  • Used routinely by Census Bureau, BLS, and statistical agencies worldwide

Exercises

ExerciseCore

Problem

An area has direct estimate yi=12.5y_i = 12.5 with sampling variance ψi=4.0\psi_i = 4.0. The model prediction is xiTβ^=10.0x_i^T\hat{\beta} = 10.0. The estimated model variance is A^=6.0\hat{A} = 6.0. Compute the Fay-Herriot composite estimate and its MSE.

ExerciseAdvanced

Problem

Consider two areas with the same model prediction xiTβ^=50x_i^T\hat{\beta} = 50 and model variance A=10A = 10. Area 1 has direct estimate y1=60y_1 = 60 with ψ1=2\psi_1 = 2 (large sample). Area 2 has direct estimate y2=60y_2 = 60 with ψ2=40\psi_2 = 40 (tiny sample). Compute the composite estimates for both areas. What happens to area 2's estimate?

References

Canonical:

  • Fay & Herriot, "Estimates of Income for Small Places" (1979), JASA 74(366)
  • Rao & Molina, Small Area Estimation (2015), Chapters 1-7

Current:

  • Pfeffermann, "New Important Developments in Small Area Estimation" (2013), Statistical Science

  • National Academies, Small-Area Income and Poverty Estimates (2000)

  • Casella & Berger, Statistical Inference (2002), Chapters 5-10

  • Lehmann & Casella, Theory of Point Estimation (1998), Chapters 1-6

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics