Centered vs Non-Centered Hierarchical Models

Quick Decision

If your model looks like...	Prefer	Why
Many observations per group, strong likelihood, well-identified local effects	Centered	The data pin down each group effect, so the original coordinates are usually well behaved.
Few observations per group, weak data, strong global pooling	Non-centered	The centered posterior develops funnel geometry and strong scale-effect coupling.
Conjugate Gibbs updates are the main reason you chose the model	Often centered	The centered form frequently preserves simple full conditionals.
NUTS shows divergences clustered near a variance or scale parameter	Non-centered	The sampler is telling you the current coordinates are too sharp near the funnel neck.
You are not sure	Try both on a small pilot run	This is a geometry choice, not a moral choice. The data regime decides.

Two Coordinate Systems for the Same Model

Consider a simple normal hierarchical model:

$\theta_j \sim \mathcal{N}(\mu, \tau^2), \qquad y_j \mid \theta_j \sim p(y_j \mid \theta_j).$

This is the centered parameterization. The group-level effect $\theta_j$ is represented directly.

The non-centered form introduces a standard normal latent variable:

$z_j \sim \mathcal{N}(0,1), \qquad \theta_j = \mu + \tau z_j, \qquad y_j \mid z_j,\mu,\tau \sim p(y_j \mid \mu + \tau z_j).$

These are mathematically equivalent descriptions of the same generative model. They differ only by a change of coordinates. The posterior geometry, however, can change dramatically under that change of coordinates, and samplers care about geometry.

Why the Geometry Changes

In the centered form, small values of $\tau$ force every $\theta_j$ to lie very near $\mu$ . Large values of $\tau$ allow the $\theta_j$ to spread out. That coupling creates the classic funnel shape:

near small $\tau$ , the posterior neck is narrow and sharply curved;
near large $\tau$ , the posterior mouth is wide and comparatively flat.

That geometry is exactly what causes trouble for Hamiltonian Monte Carlo and NUTS: one step size must work both in the narrow neck and the wide mouth.

In the non-centered form, the latent variables $z_j$ remain on a standard normal scale, and the hard coupling between local effects and the global scale is greatly reduced. The geometry becomes much closer to isotropic, which is why the non-centered form is the default rescue move for weak-data hierarchies.

When Centered Wins

The non-centered form is not a universal upgrade. Centered coordinates are often better when the data are strong enough that each group's likelihood already identifies $\theta_j$ sharply.

Then the posterior for $\theta_j$ is dominated by the data rather than by the prior scale $\tau$ , and the centered form can be easier because:

it lines up directly with the scientific interpretation of the local effects;
it often preserves conditional conjugacy for Gibbs sampling;
it avoids the deterministic transform $\theta_j = \mu + \tau z_j$ becoming unnecessarily indirect when the data already fix $\theta_j$ tightly.

This is the main reason the advice is conditional: "weak data favors non-centered" is strong; "non-centered is always better" is false.

When Non-Centered Wins

The non-centered form is strongest when the hierarchy is weakly identified:

few observations per group,
small group sample sizes with a shared variance component,
latent-variable hierarchies where the global scale does most of the work,
funnels or funnel-like pair plots,
HMC/NUTS divergences that cluster at the neck.

This is the canonical setting of eight schools, random-effects meta-analysis, and many weakly informed multilevel regressions. In those models the local effects mostly inherit their scale from the higher-level prior, so the centered coordinates create a posterior ridge that is numerically awkward.

Side-by-Side Comparison

Property	Centered	Non-Centered
Coordinates	Model the local effect $\theta_j$ directly	Model $z_j \sim \mathcal{N}(0,1)$ and recover $\theta_j = \mu + \tau z_j$
Best regime	Strong data, well-identified groups	Weak data, strong pooling, funnel risk
Typical HMC / NUTS behavior	Can diverge badly in funnel geometries	Usually much easier geometry
Typical Gibbs behavior	Often keeps clean full conditionals	May break simple conjugate updates
Scientific interpretability	Direct	Indirect but equivalent
Main failure mode	Strong scale-effect coupling	Can become less efficient when the likelihood already identifies the local effects

Relation to Empirical Bayes and Hierarchical Bayes

Do not confuse this comparison with Empirical Bayes vs. Hierarchical Bayes.

Centered vs. non-centered asks: which coordinates should represent the same hierarchical posterior?
Empirical Bayes vs. hierarchical Bayes asks: what happens to the hyperparameters - plug them in, or integrate over their posterior?

They often appear in the same models, but they are different choices.

Canonical Example: Eight Schools

In the eight-schools model, each school-specific treatment effect $\theta_j$ has its own weak estimate and shares a common population scale $\tau$ . This is the textbook case where the centered form produces a funnel.

The practical pattern is:

centered coordinates show divergences and sticky exploration for HMC/NUTS;
non-centered coordinates reduce the neck and usually clear the divergences;
if the data become much stronger per group, the balance can move back toward centered coordinates.

This is why good Bayesian workflow treats parameterization as part of the modeling process, not as an afterthought.

Common Confusions

Watch Out

The two parameterizations define the same statistical model

Changing from centered to non-centered does not change the prior or the likelihood. It changes the coordinates used to represent the latent effects. If inference changes dramatically, that is evidence that one sampler handles the geometry far better than the other, not that you changed the science.

Watch Out

Non-centered is not always better

When the likelihood strongly identifies each group-level effect, the centered form can mix better and be easier to interpret. The correct rule is about data strength relative to prior pooling, not about modernity.

Watch Out

The best parameterization depends on the sampler

HMC and NUTS often prefer the non-centered form in weak-data hierarchies. A conditionally conjugate Gibbs sampler can prefer the centered form because it keeps the full conditional updates simple. There is no one parameterization that dominates every inference algorithm.

References

Papaspiliopoulos, O., Roberts, G. O., and Skold, M. (2007). "A General Framework for the Parametrization of Hierarchical Models." Statistical Science 22(1), 59-73.
Betancourt, M. and Girolami, M. (2013). "Hamiltonian Monte Carlo for Hierarchical Models." Canonical source on funnel pathologies and reparameterization.
Gelman, A. et al. Bayesian Data Analysis, 3rd ed. (2013), Chapters 5 and 11. Standard hierarchical-model workflow treatment.
Stan Development Team. "Sampling Difficulties with Problematic Priors." Practical discussion of funnel-style pathologies.