Skip to main content

Applied ML

Predictive Coding and Autoencoders in the Brain

Hierarchical predictive coding (Rao-Ballard) and the free-energy principle as biological analogs of amortized variational inference and approximate backprop.

AdvancedTier 3Stable~15 min
0

Why This Matters

Cortex is metabolically expensive and bandwidth-limited. Predictive coding proposes that, rather than transmitting raw activations upward, each cortical area sends only the residual its higher-area model failed to predict. Top-down projections carry predictions, bottom-up projections carry prediction errors, and learning reduces error. This single architectural commitment yields a sharp algorithmic story: perception is approximate inference, and learning is gradient descent on a free-energy bound.

For ML readers the link is direct. The Rao-Ballard hierarchy is structurally a stacked autoencoder with feedback connections. The free-energy principle is the variational ELBO with biological branding. Whittington and Bogacz showed that, under specific assumptions, predictive-coding updates approximate backprop arbitrarily well using only local Hebbian-like rules. Whether the cortex actually implements any of this is contested, but the math is shared with the generative models we already train.

Core Ideas

Rao and Ballard (1999, Nat. Neurosci. 2(1)). Each layer ll maintains a state rlr_l and predicts the layer below via a generative weight matrix WlW_l: r^l1=f(Wlrl)\hat r_{l-1} = f(W_l\, r_l). The prediction error is εl1=rl1r^l1\varepsilon_{l-1} = r_{l-1} - \hat r_{l-1}. State updates and weights minimize a sum of squared errors weighted by precision (inverse variance) at each level:

F=l12σl2εl2.F = \sum_l \frac{1}{2\sigma_l^2}\, \lVert \varepsilon_l \rVert^2.

Their model trained on natural-image patches reproduced extra-classical receptive field effects (end-stopping, contextual modulation) without putting them in by hand.

Free-energy principle (Friston, 2010, Nat. Rev. Neurosci. 11(2)). Reframes predictive coding as variational inference on a generative model. The brain maintains a recognition density q(zx)q(z \mid x) and minimizes variational free energy F[q,x]=Eq[logp(x,z)]H[q]F[q, x] = \mathbb{E}_q[-\log p(x, z)] - \mathcal{H}[q], which is an upper bound on surprise logp(x)-\log p(x) and equivalent to the negative ELBO. Action also minimizes free energy, giving a unified account of perception, learning, and behavior. The unification is conceptually elegant; many specific predictions are difficult to falsify.

Connection to amortized variational inference. A VAE encoder qϕ(zx)q_\phi(z \mid x) amortizes the cost of inferring zz across data points. The Rao-Ballard hierarchy plays the same role with a fixed iterative inference procedure (a few steps of state updates per stimulus) rather than a learned encoder. Both schemes optimize the same free-energy objective; they differ in how inference is implemented.

Whittington and Bogacz (2017, Neural Comput. 29(5)). Construct a predictive-coding network in which top-down predictions and bottom-up errors evolve to a fixed point, then synaptic updates use only the locally available error and activity at each synapse. Under linear or near-linear regimes, the learned weight changes converge to backprop weight changes. This makes predictive coding the most concrete biologically plausible approximation to backprop currently on the table.

Common Confusions

Watch Out

Free energy in the brain is the same quantity as physical free energy. It is not. Friston's free energy is the variational free energy from Bayesian statistics, which has the same algebraic form as the Helmholtz free energy from statistical mechanics but tracks belief states, not particle ensembles. Treat the name as a useful analogy, not a thermodynamic identity.

Watch Out

Predictive coding has been proven to be how cortex works. It has not. Some predictions match neurophysiology (precision-weighted error responses, expectation-modulated activity), but several core claims (separate error and prediction populations, hierarchical organization of generative models) lack clean experimental confirmation. The framework is a candidate, not a settled account.

References

Related Topics

Last reviewed: April 18, 2026

Prerequisites

Foundations this topic depends on.

Next Topics