Applied ML
Hebbian Learning
Local correlation-based plasticity rules from Hebb (1949) through STDP, Oja, and BCM. Modern reinterpretations link Hebbian dynamics to predictive coding and contrastive learning.
Prerequisites
Why This Matters
Backpropagation requires a global error signal that propagates exact gradients backward through symmetric weights. Real cortex has neither: synapses update from locally available signals, and forward and backward pathways use distinct anatomical projections. Hebbian rules are the family of learning algorithms that respect those locality constraints. They are the default model of cortical plasticity, the algorithm STDP experiments measure, and the starting point for almost every "biologically plausible alternative to backprop" proposal.
Hebbian learning also matters because it keeps reappearing in machine learning under new names: PCA via Oja's rule, normalization via BCM, contrastive learning as approximate energy-based Hebbian descent. Understanding the original gives a sharper lens on the modern variants and on the recurring question of why cortex does not seem to need exact gradients.
Core Ideas
Hebb's postulate (1949). "When an axon of cell A is near enough to excite cell B and repeatedly takes part in firing it, some growth process or metabolic change takes place." The textbook simplification: . Pre and post activity correlate, weight grows. The rule is local in space (only the two endpoints) and time (only current activity).
Stability fixes: Oja and BCM. Pure Hebb is unstable: weights blow up because correlated activity always grows them. Oja (1982) added a normalizing decay term, , and proved that the rule converges to the principal eigenvector of the input covariance: it is online PCA. Bienenstock-Cooper-Munro (BCM, 1982) introduced a sliding modification threshold that scales with average post-synaptic activity, producing both LTP (long-term potentiation) and LTD (long-term depression) regimes and accounting for orientation selectivity in visual cortex.
Spike-timing-dependent plasticity. Bi and Poo (1998, J. Neurosci. 18(24)) measured plasticity in cultured hippocampal neurons as a function of the precise relative timing . Pre-before-post within roughly 20 ms causes potentiation; post-before-pre causes depression. STDP refines Hebb to a causal rule: A wires to B only if A's spike actually contributed to B's spike. STDP is the dominant phenomenological model of cortical synaptic plasticity.
Biological plausibility critiques of backprop. Backprop requires the weight-transport problem: the backward pass uses , the transpose of the forward weights. Lillicrap, Cownden, Tweed, and Akerman (2016, Nat. Commun. 7) showed that fixed random feedback weights work nearly as well, eliminating the symmetry requirement. This feedback alignment result, together with target propagation and predictive coding, suggests that approximate gradient signals carried by local Hebbian-like updates can train deep networks. Whether cortex actually implements anything in this family is unsettled.
Common Confusions
"Neurons that fire together wire together" is the rule. That phrase (Lowel and Singer, popularized by Carla Shatz) is a slogan, not Hebb's actual statement. Hebb required A to participate in firing B, which is a directional and causal claim. STDP makes this precise; symmetric correlation rules do not.
Hebbian learning is unsupervised, so it can replace SGD. Pure Hebbian rules find principal components and do unsupervised feature learning. They do not minimize task loss. Modern "biologically plausible" schemes succeed by smuggling in a global signal (a target, a contrastive phase, a predictive error), which is no longer purely Hebbian.
References
Related Topics
Last reviewed: April 18, 2026
Prerequisites
Foundations this topic depends on.
- PerceptronLayer 1
- Feedforward Networks and BackpropagationLayer 2
- Differentiation in RnLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Matrix CalculusLayer 1
- The Jacobian MatrixLayer 0A
- The Hessian MatrixLayer 0A
- Matrix Operations and PropertiesLayer 0A
- Eigenvalues and EigenvectorsLayer 0A
- Activation FunctionsLayer 1
- Convex Optimization BasicsLayer 1