Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

LLM Construction

Induction Heads

Induction heads are attention head circuits that implement pattern completion: given a sequence like [A][B]...[A], they predict [B]. They are a leading candidate mechanism for in-context learning, with strong causal evidence in small attention-only models and correlational evidence in large transformers. They emerge through a phase transition during training.

AdvancedTier 2Current~50 min

Why This Matters

Step:
Layer 1induction headLayer 0prev-token headTokensApos 1Bpos 2Cpos 3Dpos 4Apos 5?pos 6Sequence: the tokens A, B, C, D appear once, then A appears again. What token should follow?
Layer 0 (previous-token head): each position i attends to position i-1, copying the predecessor’s identity into position i’s residual stream.
Layer 1 (induction head): at the repeated A (pos 5), the query matches any position whose Layer-0-written prev-token equals A. That is position 2 (the first B, where Layer 0 wrote “prev was A”).
The induction head then copies the value at the matched position (B) into the current residual stream. The model predicts B as the next token.

When a language model sees "The capital of France is Paris ... The capital of Germany is" and predicts "Berlin", it is performing in-context learning: using patterns in the prompt to make predictions without updating its weights. Induction heads are the best-characterized circuit implicated in this ability.

The diagram above shows the two-layer mechanism on a toy sequence. Step through the toggles: Layer 0 records each token's predecessor into its own residual stream. Layer 1 then, at the repeated token, queries for positions whose Layer-0 record matches the current token. The matched position's value is copied as the prediction. This is the entire circuit.

Understanding induction heads matters for three reasons. First, they are one of the clearest mechanistic interpretability successes to date: a real, interpretable circuit found inside trained transformers. Second, Olsson et al. (2022) argue they plausibly account for a substantial fraction of in-context learning, with strong causal (ablation) evidence in small attention-only models and correlational co-occurrence evidence in larger models. Third, they emerge through a sudden phase transition during training, connecting to grokking and training dynamics.

The Mechanism

Proposition

Induction Head Circuit

Statement

An induction head is a two-head circuit spanning two attention layers that implements the following pattern completion:

Given a sequence containing ... [A] [B] ... [A], the circuit predicts [B] as the next token after the second [A].

The circuit works through composition of two attention heads:

  1. Previous-token head (Layer LL): An attention head in an earlier layer whose attention pattern shifts information one position back. For each position ii, it writes the identity of token i1i-1 into the residual stream at position ii. After this head, position ii carries information about "what came before me."

  2. Induction head (Layer L>LL' > L): An attention head in a later layer that matches the current token with earlier positions that had the same predecessor. At the second [A], this head attends to the position after the first [A] (which is [B]), because the previous-token head has placed [A]'s identity at position [B].

The composition is: the induction head's query-key matching uses the output of the previous-token head, creating a Q-composition or K-composition circuit.

Intuition

Think of it as a two-step lookup. Step 1: at every position, write a note saying "the token before me was X." Step 2: when you encounter token [A], search for other places where the note says "the token before me was [A]." Attend to those places, and copy what you find there. What you find is [B], because [B] follows the first [A] and its note says "[A] was before me."

This is a bigram lookup table computed dynamically from the context, without being stored in the weights.

Why It Matters

Induction heads are the strongest mechanistic account we have for in-context learning of exact repetition, and they require composition across layers (a single-layer transformer cannot implement the circuit). This is one of the clearest arguments that transformer depth is used for compositional computation, not just hierarchical features. The circuit also helps explain why per-token loss on later tokens in a sequence is lower than on early tokens: the induction head can only help when there are previous patterns to match. How much of total in-context learning they explain in large models is an open empirical question.

Failure Mode

Induction heads implement a specific, limited form of pattern matching: exact token repetition. They do not explain more sophisticated in-context learning (e.g., learning a novel function from labeled examples, reasoning by analogy, or arithmetic on unseen inputs). Even for the patterns they do handle, the clean causal evidence is strongest in small attention-only models from Olsson et al.; in large production transformers, evidence is primarily correlational (induction-head scores co-evolve with in-context learning loss), not fully causal.

The Phase Transition

Proposition

Induction Head Phase Transition

Statement

During training, induction heads emerge abruptly at a specific training step, co-occurring with:

  1. A sudden drop in in-context learning loss (the per-token loss on tokens later in the sequence decreases sharply)
  2. A sudden increase in the "induction head score" (the model's ability to complete [A][B]...[A] -> [B] patterns)
  3. A brief training loss spike (the model temporarily gets worse before getting better)

This transition happens at the same training step across all sequence positions and attention heads involved, consistent with a phase transition rather than gradual improvement.

Intuition

Before the transition, the model uses simple bigram statistics (token BB follows token AA based on corpus frequency). At the transition, the model discovers that it can do much better by copying from context. This new strategy requires two heads to coordinate (composition), which is why it appears suddenly: partial composition does not help. Either the circuit works or it does not.

The training loss spike likely occurs because the model is reorganizing its internal representations to support the new circuit, temporarily disrupting existing computations.

Why It Matters

This is one of the best-documented examples of a capability emerging as a phase transition during training. It provides concrete evidence for the hypothesis that capabilities emerge discretely rather than continuously, which has implications for AI safety: sudden capability gains may be hard to predict or control.

Failure Mode

The phase transition is most clearly visible in small models (1-4 layers) on controlled data. In large models trained on diverse data, the transition may be smoother or may occur at different times for different types of pattern completion. The clean phase-transition signature is partly a consequence of the controlled experimental setting.

Composition: How Attention Heads Talk to Each Other

Induction heads require attention head composition: the output of one head feeds into the computation of another. There are three types:

Q-composition: Head B uses the output of Head A to form its queries. "What I'm looking for depends on what Head A found."

K-composition: Head B uses the output of Head A to form its keys. "What I advertise to other positions depends on what Head A wrote."

V-composition: Head B uses the output of Head A to form its values. "What I pass forward depends on what Head A contributed."

Induction heads primarily use K-composition: the previous-token head writes "my predecessor was [A]" into the residual stream. The induction head uses this as part of the key, so positions with predecessor [A] become high-key-similarity matches for queries from the current [A].

This compositional structure is what makes transformer circuits more expressive than single-layer attention. The residual stream acts as a shared memory bus, and heads in different layers can build on each other's outputs.

How to Detect Induction Heads

Given a trained transformer, you can identify induction heads by:

  1. Prefix matching score: Feed sequences of the form [random tokens] [A] [B] [random tokens] [A]. Measure how much probability the model assigns to [B] after the second [A]. High score = induction head behavior.

  2. Attention pattern inspection: An induction head's attention pattern on repeated sequences shows a diagonal stripe offset by one position: position ii attends to position jj where token j=j = token ii and j<ij < i, shifted by one to attend to the token after the match.

  3. Ablation: Zero out specific attention heads and measure the change in in-context learning loss. If ablating a head in a later layer destroys in-context learning, and ablating a head in an earlier layer has the same effect, those heads likely form an induction circuit.

Common Confusions

Watch Out

Induction heads are not just copying

Induction heads copy tokens from earlier in the sequence, but the circuit is more structured than simple copying. The previous-token head performs a relational operation (shift-by-one), and the induction head performs a content-based lookup. The composition of these two operations creates a pattern-completion algorithm. Calling it "copying" misses the algorithmic structure.

Watch Out

A single attention head cannot be an induction head

The induction mechanism requires two heads across two layers. A single attention head in a one-layer transformer can learn to attend to previous occurrences of the current token, but it cannot implement the full [A][B]...[A] -> [B] pattern because it has no way to shift attention by one position. The composition across layers is the key insight.

Watch Out

Induction heads do not explain all of in-context learning

Induction heads explain exact pattern repetition: if the model has seen [A][B] before in the context, it can predict [B] after [A]. This does not explain: learning a new classification rule from labeled examples, performing arithmetic on novel inputs, or analogical reasoning. These require more complex circuits that may generalize the induction head principle but are not yet fully understood.

Watch Out

The causal evidence is strong in small models, correlational in large ones

Popular summaries often say induction heads "are" the mechanism of in-context learning. Olsson et al. (2022) are more careful: they provide causal evidence (targeted ablations, direct circuit analysis) in small attention-only models, and correlational evidence in larger, more realistic transformers (induction-head score co-evolves with the in-context learning loss curve, prefix-match scores track ICL across scales). They explicitly frame induction heads as accounting for "a substantial fraction" of in-context learning, not all of it. Treat strong claims like "induction heads are the mechanism of ICL" as an extrapolation beyond what is currently established in full-scale models.

Exercises

ExerciseCore

Problem

A transformer processes the sequence "The cat sat on the mat. The cat sat on the ___." Explain which tokens the induction head attends to when predicting the blank, and why.

ExerciseAdvanced

Problem

Explain why a one-layer transformer cannot implement an induction head, but a two-layer transformer can. What is the minimum number of attention heads needed?

References

Canonical:

  • Olsson et al., "In-context Learning and Induction Heads" (Anthropic, 2022). Sections 2-4 establish the circuit; Sections 5-6 carry the small-vs-large causal/correlational distinction emphasized on this page.
  • Elhage et al., "A Mathematical Framework for Transformer Circuits" (Anthropic, 2021). Sections on Q/K/V-composition formalize the two-layer structure induction heads exploit.

Supporting causal evidence and extensions:

  • Nanda et al., "Progress Measures for Grokking via Mechanistic Interpretability" (ICLR 2023). Circuit-analysis methodology overlaps; useful for contrasting sudden-emergence claims.
  • Conmy et al., "Towards Automated Circuit Discovery for Mechanistic Interpretability" (NeurIPS 2023). Generalizes the ablation/edge-attribution tools used to isolate induction heads.
  • Singh et al., "The Transient Nature of Emergent In-Context Learning in Transformers" (NeurIPS 2023). Shows ICL can appear and then disappear during training, complicating the "induction heads = ICL" story.

Where the claim is softer than popular summaries suggest:

  • Akyürek et al., "What learning algorithm is in-context learning?" (ICLR 2023). Argues in-context learning on regression tasks implements gradient descent, a different mechanism from induction-style pattern copying.
  • Hendel, Geva, Globerson, "In-Context Learning Creates Task Vectors" (EMNLP 2023). Evidence that ICL in large models routes through compressed task-vector representations, not just induction-style lookups.

Next Topics

  • Mechanistic interpretability: the broader research program of understanding transformer internals
  • Residual stream and transformer internals: how information flows through the transformer

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics