Skip to main content

Statistical Estimation

Empirical Bayes vs Hierarchical Bayes

What changes when hyperparameters are estimated and plugged in versus assigned a prior and integrated out, and why the gap is mostly about uncertainty rather than point estimates.

AdvancedTier 2Stable~45 min
0

Why This Matters

Both empirical Bayes and hierarchical Bayes borrow strength across related units. Both produce shrinkage. Both often give similar point estimates. That is why the distinction gets blurred.

The real difference is not the headline estimate. It is what happens to the hyperparameters.

  • Empirical Bayes estimates them once, then plugs them in.
  • Hierarchical Bayes gives them a prior and integrates over their posterior uncertainty.

When the number of groups is large and the hyperparameter posterior is tight, the two answers can be nearly identical. When the number of groups is small or the variance component is near a boundary, the difference in uncertainty can be substantial.

Mental Model

Suppose each area, study, or parameter has its own latent effect θi\theta_i, and those effects share a higher-level distribution controlled by a hyperparameter ψ\psi.

  • Empirical Bayes says: estimate ψ\psi, pretend it were known, then compute p(θy,ψ^)p(\theta \mid y, \hat{\psi}).
  • Hierarchical Bayes says: put a prior on ψ\psi, compute p(θ,ψy)p(\theta, \psi \mid y), then integrate over ψ\psi.

So the question is not whether shrinkage happens. It happens in both. The question is whether the uncertainty in the shrinkage rule is itself carried through to the end.

Formal Setup

Definition

Empirical Bayes

In empirical Bayes, the model is hierarchical but the hyperparameter ψ\psi is estimated from the marginal distribution of the data, typically by moments or maximum likelihood. Posterior summaries of the latent effects use the plug-in distribution

p(θy,ψ^).p(\theta \mid y, \hat{\psi}).

Definition

Hierarchical Bayes

In hierarchical Bayes, the hyperparameter ψ\psi is assigned a prior p(ψ)p(\psi). Inference is based on the joint posterior

p(θ,ψy)p(yθ,ψ)p(θψ)p(ψ),p(\theta, \psi \mid y) \propto p(y \mid \theta, \psi)\, p(\theta \mid \psi)\, p(\psi),

and posterior summaries of θ\theta integrate over ψ\psi.

Definition

Partial Pooling

Both EB and HB usually produce partial pooling: each unit-specific estimate lies between the direct estimate and the shared-group mean or regression surface. The amount of pooling is governed by ψ\psi.

Main Theorem

Theorem

Posterior Mean and Variance Under Hyperparameter Integration

Statement

Let θ\theta denote a latent quantity of interest and ψ\psi a hyperparameter. Under hierarchical Bayes,

E[θy]=Eψy ⁣[E(θy,ψ)],\mathbb{E}[\theta \mid y] = \mathbb{E}_{\psi \mid y}\!\left[\mathbb{E}(\theta \mid y, \psi)\right],

and

Var(θy)=Eψy ⁣[Var(θy,ψ)]+Varψy ⁣(E(θy,ψ)).\operatorname{Var}(\theta \mid y) = \mathbb{E}_{\psi \mid y}\!\left[\operatorname{Var}(\theta \mid y, \psi)\right] + \operatorname{Var}_{\psi \mid y}\!\left(\mathbb{E}(\theta \mid y, \psi)\right).

The second term is the contribution from hyperparameter uncertainty. A plug-in empirical Bayes interval based on p(θy,ψ^)p(\theta \mid y, \hat{\psi}) omits this term.

Intuition

Hierarchical Bayes averages over many plausible shrinkage rules. Empirical Bayes uses one estimated shrinkage rule and acts as though it were fixed.

Proof Sketch

The mean identity is the law of total expectation applied to the posterior. The variance identity is the law of total variance, again under the posterior distribution. The empirical Bayes plug-in approximation replaces the random hyperparameter ψ\psi by a point estimate ψ^\hat{\psi}, which removes the between-hyperparameter variability term.

Why It Matters

This theorem isolates the main inferential difference between EB and HB. Their point estimates can be close, but their interval estimates can diverge whenever hyperparameter uncertainty is not negligible.

Failure Mode

Hierarchical Bayes does not automatically dominate. If the prior on ψ\psi is poorly chosen and the data are weak, the full posterior can be sensitive to the prior in ways that a plug-in EB analysis avoids. The right conclusion is not "HB always wins." It is "the omitted uncertainty term is sometimes material."

EB vs HB at a Glance

QuestionEmpirical BayesHierarchical Bayes
What happens to hyperparameters?Estimate once and plug inPut a prior and integrate out
Point estimatesOften close to HBOften close to EB
Uncertainty intervalsUsually narrowerUsually wider when hyperparameter uncertainty matters
ComputationUsually cheaperUsually heavier
Sensitivity sourceEstimator choice for hyperparametersPrior choice for hyperparameters

The last row matters. EB is not assumption-free. It simply hides its higher level assumptions inside the hyperparameter estimator rather than inside a hyperprior.

Canonical Example

Example

Fay-Herriot shrinkage with uncertain area variance

In a small area estimation model, the area means satisfy θi=xiβ+vi\theta_i = x_i^\top \beta + v_i with viN(0,A)v_i \sim N(0, A). The shrinkage toward xiβx_i^\top \beta is controlled by the variance component AA.

An empirical Bayes analysis estimates AA by moments, ML, or REML, then plugs that value into the shrinkage formula. A hierarchical Bayes analysis assigns a prior to AA and averages over its posterior.

If the number of areas is large and the posterior for AA is concentrated, both methods often yield similar posterior means for θi\theta_i. If the number of areas is small or the data leave serious uncertainty about whether AA is close to zero, the hierarchical Bayes intervals can be materially wider. That extra width is not a bug. It is the second variance term in the theorem above.

Relation to James-Stein and Shrinkage

The James-Stein estimator is a canonical empirical Bayes story. A normal prior variance is estimated from the data, then plugged into a posterior-mean formula. Hierarchical Bayes keeps the same shrinkage logic but treats that prior variance as a random quantity with its own posterior uncertainty.

This is why EB versus HB is best understood as a distinction about the top layer of the hierarchy, not about whether shrinkage occurs.

Common Confusions

Watch Out

Empirical Bayes is not the same thing as no Bayes

Empirical Bayes still uses Bayes formulas conditionally on an estimated hyperparameter. What it does not do is assign a full prior to that hyperparameter and integrate over it.

Watch Out

Similar point estimates do not imply similar intervals

In many applications EB and HB posterior means are nearly identical, especially when there are many groups. That does not mean the interval estimates are interchangeable. The uncertainty decomposition shows exactly why they can differ.

Watch Out

Hierarchical Bayes is not automatically more honest

HB propagates hyperparameter uncertainty, but it also introduces prior sensitivity. When data are thin, two different priors on a variance component can produce meaningfully different intervals. A good HB analysis should show that sensitivity rather than hiding it.

Summary

  • Both EB and HB borrow strength through a shared hierarchical model
  • The main difference is what happens to the hyperparameters
  • EB plugs in ψ^\hat{\psi} and omits hyperparameter uncertainty
  • HB integrates over p(ψy)p(\psi \mid y) and keeps that uncertainty in the final posterior
  • Point estimates are often close; intervals can differ materially

Exercises

ExerciseCore

Problem

In one sentence, what extra source of uncertainty does hierarchical Bayes include that a plug-in empirical Bayes interval usually omits?

ExerciseAdvanced

Problem

Why are EB and HB point estimates often closer to each other than their interval estimates are?

References

Canonical:

  • Robbins, "An Empirical Bayes Approach to Statistics" (1956), Proceedings of the Third Berkeley Symposium. Origin of the empirical Bayes program.
  • Efron and Morris, "Empirical Bayes on Vector Observations: An Extension of Stein's Method" (1972), Biometrika 59(2), 335-347. Canonical shrinkage example.
  • Gelman, Carlin, Stern, Dunson, Vehtari, Rubin, Bayesian Data Analysis, 3rd ed. (2013), Chapter 5. Standard hierarchical-model treatment.
  • Ghosh and Rao, "Small Area Estimation: An Appraisal" (1994), Statistical Science 9(1), 55-93. Direct comparison of EB, EBLUP, and HB in SAE.
  • Rao and Molina, Small Area Estimation, 2nd ed. (2015), Chapters 7 and 10. Empirical Bayes, EBLUP, and hierarchical Bayes in the same notation.

Current / practice:

  • Datta, Ghosh, Huang, "Hierarchical and Empirical Bayes Methods for Adjustment of Census Undercount: The 1988 Missouri Dress Rehearsal Data" (1992), Survey Methodology. Public applied comparison of EB and HB.
  • Mukherjee and Lahiri, "On the Design-Consistency Property of Hierarchical Bayes Estimators in Finite Population Sampling" (2008), Sankhya. What HB has to prove in survey settings, not only in model settings.

Next Topics

Last reviewed: April 18, 2026

Prerequisites

Foundations this topic depends on.

Next Topics