Statistical Estimation
Empirical Bayes vs Hierarchical Bayes
What changes when hyperparameters are estimated and plugged in versus assigned a prior and integrated out, and why the gap is mostly about uncertainty rather than point estimates.
Prerequisites
Why This Matters
Both empirical Bayes and hierarchical Bayes borrow strength across related units. Both produce shrinkage. Both often give similar point estimates. That is why the distinction gets blurred.
The real difference is not the headline estimate. It is what happens to the hyperparameters.
- Empirical Bayes estimates them once, then plugs them in.
- Hierarchical Bayes gives them a prior and integrates over their posterior uncertainty.
When the number of groups is large and the hyperparameter posterior is tight, the two answers can be nearly identical. When the number of groups is small or the variance component is near a boundary, the difference in uncertainty can be substantial.
Mental Model
Suppose each area, study, or parameter has its own latent effect , and those effects share a higher-level distribution controlled by a hyperparameter .
- Empirical Bayes says: estimate , pretend it were known, then compute .
- Hierarchical Bayes says: put a prior on , compute , then integrate over .
So the question is not whether shrinkage happens. It happens in both. The question is whether the uncertainty in the shrinkage rule is itself carried through to the end.
Formal Setup
Empirical Bayes
In empirical Bayes, the model is hierarchical but the hyperparameter is estimated from the marginal distribution of the data, typically by moments or maximum likelihood. Posterior summaries of the latent effects use the plug-in distribution
Hierarchical Bayes
In hierarchical Bayes, the hyperparameter is assigned a prior . Inference is based on the joint posterior
and posterior summaries of integrate over .
Partial Pooling
Both EB and HB usually produce partial pooling: each unit-specific estimate lies between the direct estimate and the shared-group mean or regression surface. The amount of pooling is governed by .
Main Theorem
Posterior Mean and Variance Under Hyperparameter Integration
Statement
Let denote a latent quantity of interest and a hyperparameter. Under hierarchical Bayes,
and
The second term is the contribution from hyperparameter uncertainty. A plug-in empirical Bayes interval based on omits this term.
Intuition
Hierarchical Bayes averages over many plausible shrinkage rules. Empirical Bayes uses one estimated shrinkage rule and acts as though it were fixed.
Proof Sketch
The mean identity is the law of total expectation applied to the posterior. The variance identity is the law of total variance, again under the posterior distribution. The empirical Bayes plug-in approximation replaces the random hyperparameter by a point estimate , which removes the between-hyperparameter variability term.
Why It Matters
This theorem isolates the main inferential difference between EB and HB. Their point estimates can be close, but their interval estimates can diverge whenever hyperparameter uncertainty is not negligible.
Failure Mode
Hierarchical Bayes does not automatically dominate. If the prior on is poorly chosen and the data are weak, the full posterior can be sensitive to the prior in ways that a plug-in EB analysis avoids. The right conclusion is not "HB always wins." It is "the omitted uncertainty term is sometimes material."
EB vs HB at a Glance
| Question | Empirical Bayes | Hierarchical Bayes |
|---|---|---|
| What happens to hyperparameters? | Estimate once and plug in | Put a prior and integrate out |
| Point estimates | Often close to HB | Often close to EB |
| Uncertainty intervals | Usually narrower | Usually wider when hyperparameter uncertainty matters |
| Computation | Usually cheaper | Usually heavier |
| Sensitivity source | Estimator choice for hyperparameters | Prior choice for hyperparameters |
The last row matters. EB is not assumption-free. It simply hides its higher level assumptions inside the hyperparameter estimator rather than inside a hyperprior.
Canonical Example
Fay-Herriot shrinkage with uncertain area variance
In a small area estimation model, the area means satisfy with . The shrinkage toward is controlled by the variance component .
An empirical Bayes analysis estimates by moments, ML, or REML, then plugs that value into the shrinkage formula. A hierarchical Bayes analysis assigns a prior to and averages over its posterior.
If the number of areas is large and the posterior for is concentrated, both methods often yield similar posterior means for . If the number of areas is small or the data leave serious uncertainty about whether is close to zero, the hierarchical Bayes intervals can be materially wider. That extra width is not a bug. It is the second variance term in the theorem above.
Relation to James-Stein and Shrinkage
The James-Stein estimator is a canonical empirical Bayes story. A normal prior variance is estimated from the data, then plugged into a posterior-mean formula. Hierarchical Bayes keeps the same shrinkage logic but treats that prior variance as a random quantity with its own posterior uncertainty.
This is why EB versus HB is best understood as a distinction about the top layer of the hierarchy, not about whether shrinkage occurs.
Common Confusions
Empirical Bayes is not the same thing as no Bayes
Empirical Bayes still uses Bayes formulas conditionally on an estimated hyperparameter. What it does not do is assign a full prior to that hyperparameter and integrate over it.
Similar point estimates do not imply similar intervals
In many applications EB and HB posterior means are nearly identical, especially when there are many groups. That does not mean the interval estimates are interchangeable. The uncertainty decomposition shows exactly why they can differ.
Hierarchical Bayes is not automatically more honest
HB propagates hyperparameter uncertainty, but it also introduces prior sensitivity. When data are thin, two different priors on a variance component can produce meaningfully different intervals. A good HB analysis should show that sensitivity rather than hiding it.
Summary
- Both EB and HB borrow strength through a shared hierarchical model
- The main difference is what happens to the hyperparameters
- EB plugs in and omits hyperparameter uncertainty
- HB integrates over and keeps that uncertainty in the final posterior
- Point estimates are often close; intervals can differ materially
Exercises
Problem
In one sentence, what extra source of uncertainty does hierarchical Bayes include that a plug-in empirical Bayes interval usually omits?
Problem
Why are EB and HB point estimates often closer to each other than their interval estimates are?
References
Canonical:
- Robbins, "An Empirical Bayes Approach to Statistics" (1956), Proceedings of the Third Berkeley Symposium. Origin of the empirical Bayes program.
- Efron and Morris, "Empirical Bayes on Vector Observations: An Extension of Stein's Method" (1972), Biometrika 59(2), 335-347. Canonical shrinkage example.
- Gelman, Carlin, Stern, Dunson, Vehtari, Rubin, Bayesian Data Analysis, 3rd ed. (2013), Chapter 5. Standard hierarchical-model treatment.
- Ghosh and Rao, "Small Area Estimation: An Appraisal" (1994), Statistical Science 9(1), 55-93. Direct comparison of EB, EBLUP, and HB in SAE.
- Rao and Molina, Small Area Estimation, 2nd ed. (2015), Chapters 7 and 10. Empirical Bayes, EBLUP, and hierarchical Bayes in the same notation.
Current / practice:
- Datta, Ghosh, Huang, "Hierarchical and Empirical Bayes Methods for Adjustment of Census Undercount: The 1988 Missouri Dress Rehearsal Data" (1992), Survey Methodology. Public applied comparison of EB and HB.
- Mukherjee and Lahiri, "On the Design-Consistency Property of Hierarchical Bayes Estimators in Finite Population Sampling" (2008), Sankhya. What HB has to prove in survey settings, not only in model settings.
Next Topics
- Small area estimation: where EB, EBLUP, and HB compete directly
- REML and variance component estimation: one common route to the plug-in hyperparameter
- Prasad-Rao MSE correction: what a frequentist EB-style analysis must do to repair plug-in uncertainty
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- Bayesian EstimationLayer 0B
- Maximum Likelihood EstimationLayer 0B
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Differentiation in RnLayer 0A
- Shrinkage Estimation and the James-Stein EstimatorLayer 0B