Hebbian Learning

Sneiderman, Robby

Applied ML

Hebbian Learning

Local correlation-based plasticity rules from Hebb (1949) through STDP, Oja, and BCM. Modern reinterpretations link Hebbian dynamics to predictive coding and contrastive learning.

AdvancedTier 3CurrentReference~15 min

Prerequisites

Perceptron Feedforward Networks and Backpropagation Predictive Coding and Autoencoders in the Brain Spiking Neural Networks

Prereq Map

Why This Matters

Backpropagation requires a global error signal that propagates exact gradients backward through symmetric weights. Real cortex has neither: synapses update from locally available signals, and forward and backward pathways use distinct anatomical projections. Hebbian rules are the family of learning algorithms that respect those locality constraints. They are the default model of cortical plasticity, the algorithm STDP experiments measure, and the starting point for almost every "biologically plausible alternative to backprop" proposal.

Hebbian learning also matters because it keeps reappearing in machine learning under new names: PCA via Oja's rule, normalization via BCM, contrastive learning as approximate energy-based Hebbian descent. Understanding the original gives a sharper lens on the modern variants and on the recurring question of why cortex does not seem to need exact gradients.

Core Ideas

Hebb's postulate (1949). "When an axon of cell A is near enough to excite cell B and repeatedly takes part in firing it, some growth process or metabolic change takes place." The textbook simplification: $\Delta w_{ij} \propto x_i\, x_j$ . Pre and post activity correlate, weight grows. The rule is local in space (only the two endpoints) and time (only current activity).

Stability fixes: Oja and BCM. Pure Hebb is unstable: weights blow up because correlated activity always grows them. Oja (1982) added a normalizing decay term, $\Delta w_i = \eta\, y\, (x_i - y\, w_i)$ , and proved that the rule converges to the principal eigenvector of the input covariance: it is online PCA. Bienenstock-Cooper-Munro (BCM, 1982) introduced a sliding modification threshold $\theta_M$ that scales with average post-synaptic activity, producing both LTP (long-term potentiation) and LTD (long-term depression) regimes and accounting for orientation selectivity in visual cortex.

Spike-timing-dependent plasticity. Bi and Poo (1998, J. Neurosci. 18(24)) measured plasticity in cultured hippocampal neurons as a function of the precise relative timing $\Delta t = t_{\text{post}} - t_{\text{pre}}$ . Pre-before-post within roughly 20 ms causes potentiation; post-before-pre causes depression. STDP refines Hebb to a causal rule: A wires to B only if A's spike actually contributed to B's spike. STDP is the dominant phenomenological model of cortical synaptic plasticity.

Biological plausibility critiques of backprop. Backprop requires the weight-transport problem: the backward pass uses $W^\top$ , the transpose of the forward weights. Lillicrap, Cownden, Tweed, and Akerman (2016, Nat. Commun. 7) showed that fixed random feedback weights work nearly as well, eliminating the symmetry requirement. This feedback alignment result, together with target propagation and predictive coding, suggests that approximate gradient signals carried by local Hebbian-like updates can train deep networks. Whether cortex actually implements anything in this family is unsettled.

Definition

Local Plasticity Rule $Δ w_{ij} = F (x_{i}, y_{j}, m_{j})$

A plasticity rule is local when the update to synapse $w_{ij}$ depends only on pre-synaptic activity $x_i$ , post-synaptic activity $y_j$ , and signals available at the post-synaptic cell such as a modulatory factor $m_j$ . It cannot inspect the full network loss or distant weights.

Proposition

Oja Rule as Online PCA

Statement

Oja's normalized Hebbian update converges to the leading eigenvector of the input covariance under the standard stochastic-approximation conditions.

Intuition

Plain Hebb amplifies directions with high correlation but lets the weights grow without bound. Oja's subtractive normalization keeps the vector on a stable scale, so the highest-variance direction wins.

Failure Mode

If the leading eigenvalue is not separated, the limit direction can be unstable within the top eigenspace. If the data are nonstationary, the rule tracks a moving component rather than converging.

report a correction →

ExerciseCore

Problem

Why does the update $\Delta w_i = \eta y x_i$ fail as a long-run learning rule without a normalization or decay term?

Common Confusions

Watch Out

The slogan drops the causal direction

The phrase "neurons that fire together wire together" is a slogan, not Hebb's actual statement. Hebb required A to participate in firing B, which is a directional and causal claim. STDP makes this precise; symmetric correlation rules do not.

Watch Out

Hebbian learning is not a drop-in SGD replacement

Pure Hebbian rules find principal components and do unsupervised feature learning. They do not minimize task loss. Modern "biologically plausible" schemes succeed by smuggling in a global signal (a target, a contrastive phase, a predictive error), which is no longer purely Hebbian.

References

Foundational:

Hebb, The Organization of Behavior: A Neuropsychological Theory (Wiley, 1949). The original postulate, including the directional-causal statement the modern slogan loses.
Oja, "A simplified neuron model as a principal component analyzer" (J. Math. Biol. 15(3), 1982). Normalized Hebbian update with the online-PCA convergence result.
Bienenstock, Cooper & Munro, "Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex" (J. Neurosci. 2(1), 1982). The BCM rule with sliding LTP/LTD threshold.
Bi & Poo, "Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type" (J. Neurosci. 18(24), 1998). The canonical STDP measurement.

Backprop alternatives and biological plausibility:

Lillicrap, Cownden, Tweed & Akerman, "Random synaptic feedback weights support error backpropagation for deep learning" (Nat. Commun. 7, 2016; arXiv:1411.0247). Feedback alignment: random fixed feedback works.
Scellier & Bengio, "Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation" (Front. Comput. Neurosci. 11, 2017; arXiv:1602.05179). Two-phase Hebbian-like alternative to backprop.
Lillicrap, Santoro, Marris, Akerman & Hinton, "Backpropagation and the brain" (Nat. Rev. Neurosci. 21(6), 2020). Survey of biologically plausible approximations to gradient learning.

Modern Hebbian deep learning:

Krotov & Hopfield, "Unsupervised learning by competing hidden units" (PNAS 116(16), 2019; arXiv:1806.10181). Hebbian-style feature learning competitive with backprop on small benchmarks.
Pogodin & Latham, "Kernelized information bottleneck leads to biologically plausible 3-factor Hebbian learning in deep networks" (NeurIPS 2020; arXiv:2006.07123). Information-theoretic motivation for three-factor rules.
Lagani, Falchi, Gennaro & Amato, "Hebbian Deep Learning Without Feedback" (ICLR 2022; arXiv:2209.11883). Purely-Hebbian deep network without an explicit error pathway.
Halvagal & Zenke, "The combination of Hebbian and predictive plasticity learns invariant object representations in deep sensory networks" (Nat. Neurosci. 26, 2023). Combining Hebbian with predictive signals for unsupervised vision.

Connections to ML algorithms:

Sanger, "Optimal unsupervised learning in a single-layer linear feedforward neural network" (Neural Networks 2(6), 1989). Generalized Hebbian Algorithm extending Oja's rule to multiple components.
Pehlevan & Chklovskii, "Neuroscience-inspired online unsupervised learning algorithms" (IEEE Signal Process. Mag. 36(6), 2019). Survey of similarity-matching and Hebbian/anti-Hebbian networks.
Illing, Ventura, Bellec & Gerstner, "Local plasticity rules can learn deep representations using self-supervised contrastive predictions" (NeurIPS 2021; arXiv:2010.08262). Hebbian view of contrastive learning.

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Feedforward Networks and Backpropagationlayer 2 · tier 1
Perceptronlayer 1 · tier 2
Predictive Coding and Autoencoders in the Brainlayer 4 · tier 3
Spiking Neural Networkslayer 4 · tier 3

Derived topics

0

No published topic currently declares this as a prerequisite.