Quick Decision
| If your model looks like... | Prefer | Why |
|---|---|---|
| Many observations per group, strong likelihood, well-identified local effects | Centered | The data pin down each group effect, so the original coordinates are usually well behaved. |
| Few observations per group, weak data, strong global pooling | Non-centered | The centered posterior develops funnel geometry and strong scale-effect coupling. |
| Conjugate Gibbs updates are the main reason you chose the model | Often centered | The centered form frequently preserves simple full conditionals. |
| NUTS shows divergences clustered near a variance or scale parameter | Non-centered | The sampler is telling you the current coordinates are too sharp near the funnel neck. |
| You are not sure | Try both on a small pilot run | This is a geometry choice, not a moral choice. The data regime decides. |
Two Coordinate Systems for the Same Model
Consider a simple normal hierarchical model:
This is the centered parameterization. The group-level effect is represented directly.
The non-centered form introduces a standard normal latent variable:
These are mathematically equivalent descriptions of the same generative model. They differ only by a change of coordinates. The posterior geometry, however, can change dramatically under that change of coordinates, and samplers care about geometry.
Why the Geometry Changes
In the centered form, small values of force every to lie very near . Large values of allow the to spread out. That coupling creates the classic funnel shape:
- near small , the posterior neck is narrow and sharply curved;
- near large , the posterior mouth is wide and comparatively flat.
That geometry is exactly what causes trouble for Hamiltonian Monte Carlo and NUTS: one step size must work both in the narrow neck and the wide mouth.
In the non-centered form, the latent variables remain on a standard normal scale, and the hard coupling between local effects and the global scale is greatly reduced. The geometry becomes much closer to isotropic, which is why the non-centered form is the default rescue move for weak-data hierarchies.
When Centered Wins
The non-centered form is not a universal upgrade. Centered coordinates are often better when the data are strong enough that each group's likelihood already identifies sharply.
Then the posterior for is dominated by the data rather than by the prior scale , and the centered form can be easier because:
- it lines up directly with the scientific interpretation of the local effects;
- it often preserves conditional conjugacy for Gibbs sampling;
- it avoids the deterministic transform becoming unnecessarily indirect when the data already fix tightly.
This is the main reason the advice is conditional: "weak data favors non-centered" is strong; "non-centered is always better" is false.
When Non-Centered Wins
The non-centered form is strongest when the hierarchy is weakly identified:
- few observations per group,
- small group sample sizes with a shared variance component,
- latent-variable hierarchies where the global scale does most of the work,
- funnels or funnel-like pair plots,
- HMC/NUTS divergences that cluster at the neck.
This is the canonical setting of eight schools, random-effects meta-analysis, and many weakly informed multilevel regressions. In those models the local effects mostly inherit their scale from the higher-level prior, so the centered coordinates create a posterior ridge that is numerically awkward.
Side-by-Side Comparison
| Property | Centered | Non-Centered |
|---|---|---|
| Coordinates | Model the local effect directly | Model and recover |
| Best regime | Strong data, well-identified groups | Weak data, strong pooling, funnel risk |
| Typical HMC / NUTS behavior | Can diverge badly in funnel geometries | Usually much easier geometry |
| Typical Gibbs behavior | Often keeps clean full conditionals | May break simple conjugate updates |
| Scientific interpretability | Direct | Indirect but equivalent |
| Main failure mode | Strong scale-effect coupling | Can become less efficient when the likelihood already identifies the local effects |
Relation to Empirical Bayes and Hierarchical Bayes
Do not confuse this comparison with Empirical Bayes vs. Hierarchical Bayes.
- Centered vs. non-centered asks: which coordinates should represent the same hierarchical posterior?
- Empirical Bayes vs. hierarchical Bayes asks: what happens to the hyperparameters - plug them in, or integrate over their posterior?
They often appear in the same models, but they are different choices.
Canonical Example: Eight Schools
In the eight-schools model, each school-specific treatment effect has its own weak estimate and shares a common population scale . This is the textbook case where the centered form produces a funnel.
The practical pattern is:
- centered coordinates show divergences and sticky exploration for HMC/NUTS;
- non-centered coordinates reduce the neck and usually clear the divergences;
- if the data become much stronger per group, the balance can move back toward centered coordinates.
This is why good Bayesian workflow treats parameterization as part of the modeling process, not as an afterthought.
Common Confusions
The two parameterizations define the same statistical model
Changing from centered to non-centered does not change the prior or the likelihood. It changes the coordinates used to represent the latent effects. If inference changes dramatically, that is evidence that one sampler handles the geometry far better than the other, not that you changed the science.
Non-centered is not always better
When the likelihood strongly identifies each group-level effect, the centered form can mix better and be easier to interpret. The correct rule is about data strength relative to prior pooling, not about modernity.
The best parameterization depends on the sampler
HMC and NUTS often prefer the non-centered form in weak-data hierarchies. A conditionally conjugate Gibbs sampler can prefer the centered form because it keeps the full conditional updates simple. There is no one parameterization that dominates every inference algorithm.
References
- Papaspiliopoulos, O., Roberts, G. O., and Skold, M. (2007). "A General Framework for the Parametrization of Hierarchical Models." Statistical Science 22(1), 59-73.
- Betancourt, M. and Girolami, M. (2013). "Hamiltonian Monte Carlo for Hierarchical Models." Canonical source on funnel pathologies and reparameterization.
- Gelman, A. et al. Bayesian Data Analysis, 3rd ed. (2013), Chapters 5 and 11. Standard hierarchical-model workflow treatment.
- Stan Development Team. "Sampling Difficulties with Problematic Priors." Practical discussion of funnel-style pathologies.