Induction Heads Lab

A 2-layer attention-only transformer, trained from scratch in your browser on a synthetic copying task. Watch the loss curve dive at the same step the prefix-match score climbs: that joint moment is the induction circuit forming. Right-click any head chip to ablate it and see the score collapse.

Pure TypeScript. Hand-written backward pass, gradient-checked against finite differences (atol 5e-3). No tensor library, no GPU. theory

induction-head toy lab2L × 2H · d=32 · ~5.5k params

step 0 · best L0H0 0.000

input sequence

click cycles · right-click reverses

edit non-BOS cells. The lab predicts the token after the final position. When the final token already appeared earlier in the sequence, an induction head should attend to the token that came right after that earlier occurrence.

training

loss·prefix-match

attention · L1H0

row = query (where the model is asking), column = key (where it is looking). Yellow column = position of the predicted next token. Click a head chip to switch view. Right-click a chip to ablate that head.

AdamW · lr 0.01 · batch 16 · max 1500 steps

Click play (or “run a demo”) and watch the amber prefix-match score climb past 0.4 around step ~600. That moment is the induction circuit forming.

token legend

Letters A–Z map to ids 0–25. BOS = 26. Vocab size 28.

current input: 0:·BOS 1:G 2:T 3:C 4:D 5:L 6:H 7:C

PrerequisiteTransformer ArchitectureQ, K, V, residual stream, multi-head attention TheoryInduction HeadsThe 2-layer circuit behind in-context learning NextMechanistic InterpretabilityReverse-engineering trained circuits

implementation notes

The toy model is a 2-layer attention-only transformer with 2 heads per layer, d_model=32, d_head=16, vocab=28, context=32. Position embedding is sinusoidal and frozen; the unembedding is tied to the input embedding. Total parameters ≈ 5,500. Forward pass and backward pass are hand-derived against finite-difference gradient checks (atol 5e-3) — no autograd library, no tensor framework. Training runs AdamW at lr=1e-2 with batch size 16 inside a Web Worker so the UI stays responsive. The synthetic task plants `[A][B] ... [A]` patterns and asks the model to predict B, which is exactly the structure an induction head solves.

The prefix-match score for a head is the average attention weight that the head places on the position right after the previous occurrence of the current token (Olsson 2022 §3.4). A randomly initialized head sits near 1/T ≈ 0.07; an emergent induction head climbs above 0.4 within the first ~600 training steps.

TheoremPath · 594 topics · Interactive demos