Skip to main content

Comparison

Centered vs. Non-Centered Hierarchical Models

Centered and non-centered parameterizations define the same hierarchical model in different coordinates. The real choice is geometric: strong data often favors centered coordinates, while weakly identified hierarchies usually favor the non-centered form for HMC and NUTS.

Quick Decision

If your model looks like...PreferWhy
Many observations per group, strong likelihood, well-identified local effectsCenteredThe data pin down each group effect, so the original coordinates are usually well behaved.
Few observations per group, weak data, strong global poolingNon-centeredThe centered posterior develops funnel geometry and strong scale-effect coupling.
Conjugate Gibbs updates are the main reason you chose the modelOften centeredThe centered form frequently preserves simple full conditionals.
NUTS shows divergences clustered near a variance or scale parameterNon-centeredThe sampler is telling you the current coordinates are too sharp near the funnel neck.
You are not sureTry both on a small pilot runThis is a geometry choice, not a moral choice. The data regime decides.

Two Coordinate Systems for the Same Model

Consider a simple normal hierarchical model:

θjN(μ,τ2),yjθjp(yjθj).\theta_j \sim \mathcal{N}(\mu, \tau^2), \qquad y_j \mid \theta_j \sim p(y_j \mid \theta_j).

This is the centered parameterization. The group-level effect θj\theta_j is represented directly.

The non-centered form introduces a standard normal latent variable:

zjN(0,1),θj=μ+τzj,yjzj,μ,τp(yjμ+τzj).z_j \sim \mathcal{N}(0,1), \qquad \theta_j = \mu + \tau z_j, \qquad y_j \mid z_j,\mu,\tau \sim p(y_j \mid \mu + \tau z_j).

These are mathematically equivalent descriptions of the same generative model. They differ only by a change of coordinates. The posterior geometry, however, can change dramatically under that change of coordinates, and samplers care about geometry.

Why the Geometry Changes

In the centered form, small values of τ\tau force every θj\theta_j to lie very near μ\mu. Large values of τ\tau allow the θj\theta_j to spread out. That coupling creates the classic funnel shape:

  • near small τ\tau, the posterior neck is narrow and sharply curved;
  • near large τ\tau, the posterior mouth is wide and comparatively flat.

That geometry is exactly what causes trouble for Hamiltonian Monte Carlo and NUTS: one step size must work both in the narrow neck and the wide mouth.

In the non-centered form, the latent variables zjz_j remain on a standard normal scale, and the hard coupling between local effects and the global scale is greatly reduced. The geometry becomes much closer to isotropic, which is why the non-centered form is the default rescue move for weak-data hierarchies.

When Centered Wins

The non-centered form is not a universal upgrade. Centered coordinates are often better when the data are strong enough that each group's likelihood already identifies θj\theta_j sharply.

Then the posterior for θj\theta_j is dominated by the data rather than by the prior scale τ\tau, and the centered form can be easier because:

  • it lines up directly with the scientific interpretation of the local effects;
  • it often preserves conditional conjugacy for Gibbs sampling;
  • it avoids the deterministic transform θj=μ+τzj\theta_j = \mu + \tau z_j becoming unnecessarily indirect when the data already fix θj\theta_j tightly.

This is the main reason the advice is conditional: "weak data favors non-centered" is strong; "non-centered is always better" is false.

When Non-Centered Wins

The non-centered form is strongest when the hierarchy is weakly identified:

  • few observations per group,
  • small group sample sizes with a shared variance component,
  • latent-variable hierarchies where the global scale does most of the work,
  • funnels or funnel-like pair plots,
  • HMC/NUTS divergences that cluster at the neck.

This is the canonical setting of eight schools, random-effects meta-analysis, and many weakly informed multilevel regressions. In those models the local effects mostly inherit their scale from the higher-level prior, so the centered coordinates create a posterior ridge that is numerically awkward.

Side-by-Side Comparison

PropertyCenteredNon-Centered
CoordinatesModel the local effect θj\theta_j directlyModel zjN(0,1)z_j \sim \mathcal{N}(0,1) and recover θj=μ+τzj\theta_j = \mu + \tau z_j
Best regimeStrong data, well-identified groupsWeak data, strong pooling, funnel risk
Typical HMC / NUTS behaviorCan diverge badly in funnel geometriesUsually much easier geometry
Typical Gibbs behaviorOften keeps clean full conditionalsMay break simple conjugate updates
Scientific interpretabilityDirectIndirect but equivalent
Main failure modeStrong scale-effect couplingCan become less efficient when the likelihood already identifies the local effects

Relation to Empirical Bayes and Hierarchical Bayes

Do not confuse this comparison with Empirical Bayes vs. Hierarchical Bayes.

  • Centered vs. non-centered asks: which coordinates should represent the same hierarchical posterior?
  • Empirical Bayes vs. hierarchical Bayes asks: what happens to the hyperparameters - plug them in, or integrate over their posterior?

They often appear in the same models, but they are different choices.

Canonical Example: Eight Schools

In the eight-schools model, each school-specific treatment effect θj\theta_j has its own weak estimate and shares a common population scale τ\tau. This is the textbook case where the centered form produces a funnel.

The practical pattern is:

  • centered coordinates show divergences and sticky exploration for HMC/NUTS;
  • non-centered coordinates reduce the neck and usually clear the divergences;
  • if the data become much stronger per group, the balance can move back toward centered coordinates.

This is why good Bayesian workflow treats parameterization as part of the modeling process, not as an afterthought.

Common Confusions

Watch Out

The two parameterizations define the same statistical model

Changing from centered to non-centered does not change the prior or the likelihood. It changes the coordinates used to represent the latent effects. If inference changes dramatically, that is evidence that one sampler handles the geometry far better than the other, not that you changed the science.

Watch Out

Non-centered is not always better

When the likelihood strongly identifies each group-level effect, the centered form can mix better and be easier to interpret. The correct rule is about data strength relative to prior pooling, not about modernity.

Watch Out

The best parameterization depends on the sampler

HMC and NUTS often prefer the non-centered form in weak-data hierarchies. A conditionally conjugate Gibbs sampler can prefer the centered form because it keeps the full conditional updates simple. There is no one parameterization that dominates every inference algorithm.

References

  1. Papaspiliopoulos, O., Roberts, G. O., and Skold, M. (2007). "A General Framework for the Parametrization of Hierarchical Models." Statistical Science 22(1), 59-73.
  2. Betancourt, M. and Girolami, M. (2013). "Hamiltonian Monte Carlo for Hierarchical Models." Canonical source on funnel pathologies and reparameterization.
  3. Gelman, A. et al. Bayesian Data Analysis, 3rd ed. (2013), Chapters 5 and 11. Standard hierarchical-model workflow treatment.
  4. Stan Development Team. "Sampling Difficulties with Problematic Priors." Practical discussion of funnel-style pathologies.