Sampling MCMC
No-U-Turn Sampler and Neal's Funnel
NUTS removes HMC's hand-tuned trajectory length, but Neal's funnel shows why geometry still dominates. Centered hierarchical models create narrow necks, divergences, and misleading convergence unless you reparameterize.
Prerequisites
Why This Matters
Hamiltonian Monte Carlo already solves one huge problem in Bayesian computation: random-walk proposals in high dimensions. But standard HMC still asks the user to choose a trajectory length. Too short and it random-walks; too long and it wastes computation by curling back over ground it already explored.
The No-U-Turn Sampler (NUTS) fixes that tuning problem. It grows the trajectory until the simulated path starts turning back on itself, then stops automatically. That is why Stan made NUTS the default sampler.
But NUTS does not make geometry disappear. Neal's funnel is the classic counterexample. A centered hierarchical model can have a narrow neck where latent variables are forced into a tiny region when the scale parameter is small. NUTS will still struggle there, and the struggle will often surface as divergences.
NUTS fixes path-length tuning, but it does not repeal bad geometry
Neal's funnel is the canonical reminder that hierarchical geometry matters more than sampler branding. NUTS automatically stops long trajectories before they double back, but the centered funnel still creates a narrow neck where divergences cluster.
Canonical funnel
A standard form is and , so the conditional variance is .
What NUTS actually adapts
NUTS removes the need to hand-pick the leapfrog count by growing a binary tree until the trajectory begins to turn back on itself. The geometry can still be bad even when the path-length tuning is good.
Why reparameterization wins
Write with and the awkward funnel collapses into a much more isotropic base geometry.
Mental Model
Think of NUTS as "HMC with automatic path length." It is still the same basic Hamiltonian system underneath: leapfrog integration, a mass matrix, and a Metropolis accept/reject correction. The automation only decides how long to follow the trajectory before stopping.
Now think of Neal's funnel as a geometry trap. In the wide mouth of the funnel, the latent variables can spread out. In the narrow neck, they are forced close to zero. The local curvature changes violently across the space, so one global step size is always wrong somewhere. That is why divergences cluster near the neck.
Formal Setup
No-U-Turn Sampler (NUTS)
NUTS is an extension of HMC that adaptively determines the effective number of leapfrog steps. Starting from the current state, it recursively doubles the trajectory in forward and backward time to build a binary tree of candidate states. The tree expansion stops once the trajectory begins to double back on itself, meaning further simulation would mostly retrace the same region.
Neal's Funnel
One standard -dimensional form of Neal's funnel is
The second argument of here is the standard deviation, so the conditional variance of each is . When is large and negative, the variables are squeezed into a tight neck. When is large and positive, the funnel opens dramatically.
Divergent Transition
A divergent transition is a leapfrog trajectory whose numerical integration error becomes so large that the simulated Hamiltonian flow can no longer be trusted. In practice, divergences are not random warnings. They are localized evidence that the target geometry is forcing the integrator into a regime it cannot resolve with the current parameterization and step size.
Main Propositions
The NUTS stopping rule adapts path length, not target geometry
Statement
NUTS stops tree expansion once the simulated trajectory begins to reverse direction. A standard stopping check is whether either endpoint momentum points back toward the other endpoint:
This removes manual tuning of the leapfrog count , but it does not change the geometry of the target density itself.
Intuition
If the trajectory has started pointing back toward where it came from, taking more leapfrog steps mostly wastes compute. NUTS detects that automatically. But the same state space, the same curvature, and the same problematic necks are still there.
Why It Matters
This is the clean way to think about NUTS. It is a major usability and efficiency improvement over fixed-length HMC, but it is not a cure for bad hierarchical parameterizations.
Non-centered coordinates undo the funnel neck
Statement
Define new latent variables and map them to the original coordinates by
In the non-centered coordinates , the prior factorizes into a product of standard normals. The severe neck geometry disappears from the base coordinates, making HMC and NUTS much easier to tune.
Intuition
The centered model stores scale and local variation in the same coordinates. When the scale collapses, the local variables must collapse with it. The non-centered parameterization separates those roles: the stay in a standard normal cloud, and the exponential scaling is applied only when you map back to the original model coordinates.
Proof Sketch
Substitute into the centered prior. The Gaussian term for reduces to a standard normal term in , and the Jacobian factor is exactly canceled by the scale term in the Gaussian density. The resulting base prior is a product of a normal density in and standard normal densities in the .
Why It Matters
This is the canonical example of a modeling fix beating a sampler fix. Reparameterization often improves posterior geometry more than any amount of sampler retuning.
Failure Mode
Non-centering is not universally better. When the data strongly identify the local latent variables, the centered parameterization can mix better. The choice depends on how much the posterior is dominated by the prior hierarchy versus the likelihood.
What Divergences Really Mean
A common mistake is to read divergences as mere implementation noise. They are usually telling you something structural:
- the posterior has regions of extremely high curvature,
- the leapfrog step size cannot resolve those regions everywhere,
- and the chain is failing exactly where the geometry is hardest.
So the right response is rarely "just run longer." It is usually:
- inspect where the divergences land,
- reparameterize,
- strengthen weak priors,
- and only then revisit sampler tuning.
Centered vs Non-Centered Hierarchies
The funnel appears whenever a global scale parameter governs many local variables. This is not an artificial pathology. Hierarchical normals, multilevel regressions, latent-variable models, and many Bayesian neural network priors all create the same tension.
Centered parameterization stores the local variables directly:
Non-centered parameterization stores a standard normal base variable and then rescales it:
The posterior decides which version is better. Weak-data hierarchies usually prefer the non-centered form; strong-data hierarchies can prefer the centered form.
For the direct side-by-side decision rule, see Centered vs. Non-Centered Hierarchical Models.
Common Confusions
NUTS is not just HMC with fewer knobs
It is true that NUTS removes the leapfrog-count knob. But the mass matrix, step size adaptation, and parameterization still matter a great deal. Treating NUTS as a universal turnkey fix hides the geometry problem.
A good R-hat does not erase divergences
Chains can agree with each other while still missing important regions of the target geometry. Divergences are a local geometric warning; is a global mixing diagnostic. They answer different questions.
Neal's funnel is not a toy problem with no real analogue
It is a distilled version of a real hierarchical pathology. Whenever a global scale controls many local latent variables, the same neck-versus-mouth geometry can appear in practice.
Exercises
Problem
Why does a centered hierarchical normal model become hard for HMC when the global scale gets very small?
Problem
Why can NUTS still show many divergences even after warmup chooses a reasonable step size and mass matrix?
Problem
Suppose a hierarchical model shows persistent divergences under NUTS. How would you decide whether the right move is non-centering, stronger priors, or a more fundamental model rewrite?
References
- Matthew D. Hoffman and Andrew Gelman, The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo, JMLR 2014. The original NUTS paper.
- Michael Betancourt, A Conceptual Introduction to Hamiltonian Monte Carlo, arXiv, revised 2018. Best conceptual systems-and-geometry account of HMC.
- Michael Betancourt and Mark Girolami, Hamiltonian Monte Carlo for Hierarchical Models, arXiv 2013. Canonical source on hierarchical pathologies and reparameterization.
- Radford M. Neal, Slice Sampling, Annals of Statistics 2003. Contains the original funnel distribution test problem.
- Michael Betancourt, Diagnosing Biased Inference with Divergences, case study. Best concrete explanation of why divergences are geometric evidence.
- Stan Development Team, Sampling Difficulties with Problematic Priors, Stan User's Guide. Practical reference for funnel-style pathologies and reparameterization.
Next Topics
This page is the geometry-first sequel to HMC. The natural continuations are:
- Hamiltonian Monte Carlo for the sampler mechanics,
- Bayesian Neural Networks for a more applied hierarchical setting,
- and Bayesian State Estimation for sequential latent-variable models where geometry and inference meet.
Last reviewed: April 25, 2026
Prerequisites
Foundations this topic depends on.
- Hamiltonian Monte CarloLayer 3
- Metropolis-Hastings AlgorithmLayer 2
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Bayesian EstimationLayer 0B
- Maximum Likelihood Estimation: Theory, Information Identity, and Asymptotic EfficiencyLayer 0B
- Differentiation in RnLayer 0A
- Vectors, Matrices, and Linear MapsLayer 0A
- Continuity in RⁿLayer 0A
- Metric Spaces, Convergence, and CompletenessLayer 0A
- Central Limit TheoremLayer 0B
- Law of Large NumbersLayer 0B
- Random VariablesLayer 0A
- Kolmogorov Probability AxiomsLayer 0A
- Expectation, Variance, Covariance, and MomentsLayer 0A
- KL DivergenceLayer 1
- Information Theory FoundationsLayer 0B