Empirical Bayes vs Hierarchical Bayes

Sneiderman, Robby

Statistical Estimation

Empirical Bayes vs Hierarchical Bayes

What changes when hyperparameters are estimated and plugged in versus assigned a prior and integrated out, and why the gap is mostly about uncertainty rather than point estimates.

AdvancedTier 2StableSupporting~45 min

Prerequisites

Bayesian Estimation Shrinkage Estimation James Stein Adjusted Density Maximization Bayesian Linear Regression

Start 8-question practice · 1 available Prereq Map

Learning position

Read this page in the graph.

statistical-estimation | layer 2 | tier 2. This page has 5 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Small Area Estimation

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Both empirical Bayes and hierarchical Bayes borrow strength across related units. Both produce shrinkage. Both often give similar point estimates. That is why the distinction gets blurred.

The real difference is not the headline estimate. It is what happens to the hyperparameters.

Empirical Bayes estimates them once, then plugs them in.
Hierarchical Bayes gives them a prior and integrates over their posterior uncertainty.

When the number of groups is large and the hyperparameter posterior is tight, the two answers can be nearly identical. When the number of groups is small or the variance component is near a boundary, the difference in uncertainty can be substantial.

Mental Model

Suppose each area, study, or parameter has its own latent effect $\theta_i$ , and those effects share a higher-level distribution controlled by a hyperparameter $\psi$ .

Empirical Bayes says: estimate $\psi$ , pretend it were known, then compute $p(\theta \mid y, \hat{\psi})$ .
Hierarchical Bayes says: put a prior on $\psi$ , compute $p(\theta, \psi \mid y)$ , then integrate over $\psi$ .

So the question is not whether shrinkage happens. It happens in both. The question is whether the uncertainty in the shrinkage rule is itself carried through to the end.

This is a different question from Centered vs. Non-Centered Hierarchical Models. That comparison is about coordinates for the same hierarchical posterior. This page is about whether the hyperparameters themselves are estimated once or integrated over.

Formal Setup

Definition

Empirical Bayes

In empirical Bayes, the model is hierarchical but the hyperparameter $\psi$ is estimated from the marginal distribution of the data, typically by moments or maximum likelihood. Posterior summaries of the latent effects use the plug-in distribution

$p(\theta \mid y, \hat{\psi}).$

Definition

Hierarchical Bayes

In hierarchical Bayes, the hyperparameter $\psi$ is assigned a prior $p(\psi)$ . Inference is based on the joint posterior

$p(\theta, \psi \mid y) \propto p(y \mid \theta, \psi)\, p(\theta \mid \psi)\, p(\psi),$

and posterior summaries of $\theta$ integrate over $\psi$ .

Definition

Partial Pooling

Both EB and HB usually produce partial pooling: each unit-specific estimate lies between the direct estimate and the shared-group mean or regression surface. The amount of pooling is governed by $\psi$ .

Main Theorem

Theorem

Posterior Mean and Variance Under Hyperparameter Integration

Statement

Let $\theta$ denote a latent quantity of interest and $\psi$ a hyperparameter. Under hierarchical Bayes,

$\mathbb{E}[\theta \mid y] = \mathbb{E}_{\psi \mid y}\!\left[\mathbb{E}(\theta \mid y, \psi)\right],$

and

$\operatorname{Var}(\theta \mid y) = \mathbb{E}_{\psi \mid y}\!\left[\operatorname{Var}(\theta \mid y, \psi)\right] + \operatorname{Var}_{\psi \mid y}\!\left(\mathbb{E}(\theta \mid y, \psi)\right).$

The second term is the contribution from hyperparameter uncertainty. A plug-in empirical Bayes interval based on $p(\theta \mid y, \hat{\psi})$ omits this term.

Intuition

Hierarchical Bayes averages over many plausible shrinkage rules. Empirical Bayes uses one estimated shrinkage rule and acts as though it were fixed.

Proof Sketch

The mean identity is the law of total expectation applied to the posterior. The variance identity is the law of total variance, again under the posterior distribution. The empirical Bayes plug-in approximation replaces the random hyperparameter $\psi$ by a point estimate $\hat{\psi}$ , which removes the between-hyperparameter variability term.

Why It Matters

This theorem isolates the main inferential difference between EB and HB. Their point estimates can be close, but their interval estimates can diverge whenever hyperparameter uncertainty is not negligible.

Failure Mode

Hierarchical Bayes does not automatically dominate. If the prior on $\psi$ is poorly chosen and the data are weak, the full posterior can be sensitive to the prior in ways that a plug-in EB analysis avoids. The right conclusion is not "HB always wins." It is "the omitted uncertainty term is sometimes material."

report a correction →

EB vs HB at a Glance

Question	Empirical Bayes	Hierarchical Bayes
What happens to hyperparameters?	Estimate once and plug in	Put a prior and integrate out
Point estimates	Often close to HB	Often close to EB
Uncertainty intervals	Usually narrower	Usually wider when hyperparameter uncertainty matters
Computation	Usually cheaper	Usually heavier
Sensitivity source	Estimator choice for hyperparameters	Prior choice for hyperparameters

The last row matters. EB is not assumption-free. It simply hides its higher level assumptions inside the hyperparameter estimator rather than inside a hyperprior.

Canonical Example

Example

Fay-Herriot shrinkage with uncertain area variance

In a small area estimation model, the area means satisfy $\theta_i = x_i^\top \beta + v_i$ with $v_i \sim N(0, A)$ . The shrinkage toward $x_i^\top \beta$ is controlled by the variance component $A$ .

An empirical Bayes analysis estimates $A$ by moments, ML, or REML, then plugs that value into the shrinkage formula. A hierarchical Bayes analysis assigns a prior to $A$ and averages over its posterior.

If the number of areas is large and the posterior for $A$ is concentrated, both methods often yield similar posterior means for $\theta_i$ . If the number of areas is small or the data leave serious uncertainty about whether $A$ is close to zero, the hierarchical Bayes intervals can be materially wider. That extra width is not a bug. It is the second variance term in the theorem above.

Relation to James-Stein and Shrinkage

The James-Stein estimator is a canonical empirical Bayes story. A normal prior variance is estimated from the data, then plugged into a posterior-mean formula. Hierarchical Bayes keeps the same shrinkage logic but treats that prior variance as a random quantity with its own posterior uncertainty.

This is why EB versus HB is best understood as a distinction about the top layer of the hierarchy, not about whether shrinkage occurs.

Common Confusions

Watch Out

Empirical Bayes is not the same thing as no Bayes

Empirical Bayes still uses Bayes formulas conditionally on an estimated hyperparameter. What it does not do is assign a full prior to that hyperparameter and integrate over it.

Watch Out

Similar point estimates do not imply similar intervals

In many applications EB and HB posterior means are nearly identical, especially when there are many groups. That does not mean the interval estimates are interchangeable. The uncertainty decomposition shows exactly why they can differ.

Watch Out

Hierarchical Bayes is not automatically more honest

HB propagates hyperparameter uncertainty, but it also introduces prior sensitivity. When data are thin, two different priors on a variance component can produce meaningfully different intervals. A good HB analysis should show that sensitivity rather than hiding it.

Summary

Both EB and HB borrow strength through a shared hierarchical model
The main difference is what happens to the hyperparameters
EB plugs in $\hat{\psi}$ and omits hyperparameter uncertainty
HB integrates over $p(\psi \mid y)$ and keeps that uncertainty in the final posterior
Point estimates are often close; intervals can differ materially

Exercises

ExerciseCore

Problem

In one sentence, what extra source of uncertainty does hierarchical Bayes include that a plug-in empirical Bayes interval usually omits?

ExerciseAdvanced

Problem

Why are EB and HB point estimates often closer to each other than their interval estimates are?

References

Canonical:

Robbins, "An Empirical Bayes Approach to Statistics" (1956), Proceedings of the Third Berkeley Symposium. Origin of the empirical Bayes program.
Efron and Morris, "Empirical Bayes on Vector Observations: An Extension of Stein's Method" (1972), Biometrika 59(2), 335-347. Canonical shrinkage example.
Morris, "Parametric Empirical Bayes Inference: Theory and Applications" (1983), JASA 78(381), 47-55. Foundational paper on calibrated parametric EB intervals — explicitly handles the additional uncertainty from estimating hyperparameters that pure plug-in EB ignores.
Gelman, Carlin, Stern, Dunson, Vehtari, Rubin, Bayesian Data Analysis, 3rd ed. (2013), Chapter 5. Standard hierarchical-model treatment.
Carlin and Louis, Bayesian Methods for Data Analysis, 3rd ed. (2008), Chapters 5-7. Side-by-side EB and HB development with the same data examples.
Efron, Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction (2010), Cambridge University Press. Modern parametric and nonparametric EB at high dimension.
Ghosh and Rao, "Small Area Estimation: An Appraisal" (1994), Statistical Science 9(1), 55-93. Direct comparison of EB, EBLUP, and HB in SAE.
Rao and Molina, Small Area Estimation, 2nd ed. (2015), Chapters 7 and 10. Empirical Bayes, EBLUP, and hierarchical Bayes in the same notation.

Current / practice:

Datta, Ghosh, Huang, "Hierarchical and Empirical Bayes Methods for Adjustment of Census Undercount: The 1988 Missouri Dress Rehearsal Data" (1992), Survey Methodology. Public applied comparison of EB and HB.
Mukherjee and Lahiri, "On the Design-Consistency Property of Hierarchical Bayes Estimators in Finite Population Sampling" (2008), Sankhya. What HB has to prove in survey settings, not only in model settings.

Next Topics

Small area estimation: where EB, EBLUP, and HB compete directly
REML and variance component estimation: one common route to the plug-in hyperparameter
Prasad-Rao MSE correction: what a frequentist EB-style analysis must do to repair plug-in uncertainty

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

5

Conjugate Priorslayer 0B · tier 1
Shrinkage Estimation and the James-Stein Estimator: Inadmissibility, SURE, and Brown's Characterizationlayer 0B · tier 1
Bayesian Linear Regressionlayer 2 · tier 1
Bayesian Estimationlayer 0B · tier 2
Adjusted Density Maximizationlayer 4 · tier 3

Derived topics

0

No published topic currently declares this as a prerequisite.