Statistical Foundations
Small Area Estimation
Methods for producing reliable estimates in domains where direct survey estimates have too few observations for useful precision, using Fay-Herriot and unit-level models that borrow strength across areas.
Prerequisites
Why This Matters
National surveys are designed to produce reliable estimates at the national or regional level. But policymakers need estimates for every county, every congressional district, every demographic subgroup. The sample size in any single small area is often too small for a direct estimate to be useful.
Small area estimation (SAE) solves this by combining the direct survey estimate with a model-based prediction, "borrowing strength" from related areas. The U.S. Census Bureau uses SAE to produce poverty estimates for every county and school district. The Bureau of Labor Statistics uses it for local unemployment rates. Health agencies use it for disease prevalence by county.
For ML practitioners, SAE is a clean example of the bias-variance tradeoff in action: you accept a small amount of model-dependent bias in exchange for a large reduction in variance.
Mental Model
You have two sources of information about a small area's mean :
- Direct estimate : unbiased but high variance (small sample size in area )
- Model-based prediction : low variance but potentially biased (relies on the model being correct)
The optimal combination is a weighted average that puts more weight on the direct estimate when it is precise and more weight on the model when the direct estimate is noisy. This is exactly what the composite estimator does.
Formal Setup
The Fay-Herriot Model
Fay-Herriot Model
The Fay-Herriot model (1979) is an area-level model with two stages:
Sampling model (Level 1): The direct estimator for area is:
where is the true area mean and is the known sampling variance of the direct estimator (estimated from the survey design).
Linking model (Level 2): The true area means follow a regression:
where are area-level covariates, is a vector of regression coefficients, and is the model variance (unknown, to be estimated).
Combining: , a mixed model with known heterogeneous error variances and unknown random effect variance .
Direct Estimator
The direct estimator uses only the survey data from area . Under the survey design, it is (approximately) unbiased for . Its variance is large when the sample size in area is small. For areas with zero sample, the direct estimator is undefined.
Synthetic Estimator
The synthetic estimator uses the regression model to predict the area mean from covariates. It has low variance (it uses data from all areas through ) but is biased if the model is misspecified or if area deviates from the model.
Main Theorems
Fay-Herriot Composite Estimator
Statement
Under the Fay-Herriot model, the best linear unbiased predictor (BLUP) of is the composite estimator:
where the shrinkage factor is:
and is the GLS estimate of from the combined model. The mean squared error of this estimator is:
which is always smaller than both (variance of the direct estimator) and (variance of the synthetic estimator).
Intuition
The composite estimator is a weighted average of the direct estimate and the regression prediction. The weight on the direct estimate depends on the signal-to-noise ratio. When the sampling variance is small relative to the model variance , the direct estimate is reliable and gets high weight ( close to 1). When is large (small sample), the model prediction dominates ( close to 0). This is shrinkage toward the regression line.
Proof Sketch
The Fay-Herriot model is a linear mixed model with . The BLUP of is the conditional expectation . This follows from the standard result for the conditional mean of a bivariate normal.
Why It Matters
This is the workhorse of small area estimation in practice. The Census Bureau's SAIPE (Small Area Income and Poverty Estimates) program uses Fay-Herriot models. The formula shows exactly how borrowing strength works: areas with less data are pulled more toward the model, while areas with more data retain their direct estimates.
Failure Mode
If the linking model is misspecified (wrong covariates, wrong functional form), the synthetic component is biased, and this bias is inherited by the composite estimator. The MSE formula assumes is known; in practice, is estimated, which adds uncertainty not captured by the simple formula. The Prasad-Rao correction addresses this. If is poorly estimated (very small area sample sizes), the model can behave erratically.
Empirical Bayes and Hierarchical Bayes
Empirical Bayes (EB)
In the empirical Bayes approach, and are estimated from the data by maximum likelihood, REML, or method of moments. The EB estimator plugs these estimates into the BLUP formula. This is computationally simple but underestimates uncertainty because it treats the estimated as if it were the true value.
Hierarchical Bayes (HB)
The hierarchical Bayes approach places priors on and , then computes the full posterior distribution via MCMC. This properly accounts for uncertainty in all parameters. The posterior mean is similar to the EB point estimate but the posterior intervals are wider (and more honest) because they incorporate parameter uncertainty.
The HB approach is preferred when proper uncertainty quantification matters, such as when the estimates feed into policy decisions.
Unit-Level Models
The Fay-Herriot model works with area-level summaries. When unit-level data is available, the Battese-Harter-Fuller (BHF) model (1988) is the standard:
where is the outcome for unit in area , is the area random effect, and . The BLUP of the area mean uses both the sample and non-sample units in area .
Unit-level models are more efficient when individual covariates carry information, but they require access to the microdata and the population-level covariate distribution for each area.
Common Confusions
SAE is not just multilevel modeling
While SAE uses hierarchical models, the goals differ. In standard multilevel modeling, you care about the fixed effects or the variance components. In SAE, you care about predicting the area-specific random effects . The MSE of the predictor, not the variance of the estimator of , is the primary quantity of interest.
The sampling variances psi_i are treated as known
This is a modeling convenience, not a literal truth. In practice, are estimated from the survey design. For areas with reasonable sample sizes, this is fine. For very small areas, the estimated may be unstable. Smoothing the sampling variances (e.g., via a generalized variance function) is common practice.
Borrowing strength is not free
The composite estimator reduces MSE on average across areas. But for any specific area , the model-based component can introduce bias if the model is wrong for that area. SAE trades unbiasedness for lower MSE. This is the same tradeoff as in ridge regression or James-Stein estimation.
Summary
- Direct estimates have no bias but high variance in small areas
- Synthetic (model-based) estimates have low variance but potential bias
- The Fay-Herriot composite estimator optimally combines both
- Shrinkage factor depends on relative noise levels
- Empirical Bayes is simple but underestimates uncertainty
- Hierarchical Bayes properly accounts for parameter uncertainty
- Used routinely by Census Bureau, BLS, and statistical agencies worldwide
Exercises
Problem
An area has direct estimate with sampling variance . The model prediction is . The estimated model variance is . Compute the Fay-Herriot composite estimate and its MSE.
Problem
Consider two areas with the same model prediction and model variance . Area 1 has direct estimate with (large sample). Area 2 has direct estimate with (tiny sample). Compute the composite estimates for both areas. What happens to area 2's estimate?
References
Canonical:
- Fay & Herriot, "Estimates of Income for Small Places" (1979), JASA 74(366)
- Rao & Molina, Small Area Estimation (2015), Chapters 1-7
Current:
-
Pfeffermann, "New Important Developments in Small Area Estimation" (2013), Statistical Science
-
National Academies, Small-Area Income and Poverty Estimates (2000)
-
Casella & Berger, Statistical Inference (2002), Chapters 5-10
-
Lehmann & Casella, Theory of Point Estimation (1998), Chapters 1-6
Next Topics
- Longitudinal surveys and panel data: repeated measurement over time
- Official statistics and national surveys: the institutional context for SAE
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Bayesian EstimationLayer 0B
- Maximum Likelihood EstimationLayer 0B
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Differentiation in RnLayer 0A
- Linear RegressionLayer 1
- Matrix Operations and PropertiesLayer 0A