Induction Heads Lab
A 2-layer attention-only transformer, trained from scratch in your browser on a synthetic copying task. Watch the loss curve dive at the same step the prefix-match score climbs: that joint moment is the induction circuit forming. Right-click any head chip to ablate it and see the score collapse.
Pure TypeScript. Hand-written backward pass, gradient-checked against finite differences (atol 5e-3). No tensor library, no GPU. theory
edit non-BOS cells. The lab predicts the token after the final position. When the final token already appeared earlier in the sequence, an induction head should attend to the token that came right after that earlier occurrence.
row = query (where the model is asking), column = key (where it is looking). Yellow column = position of the predicted next token. Click a head chip to switch view. Right-click a chip to ablate that head.
token legend
Letters A–Z map to ids 0–25. BOS = 26. Vocab size 28.
current input: 0:·BOS 1:G 2:T 3:C 4:D 5:L 6:H 7:C
implementation notes
The toy model is a 2-layer attention-only transformer with 2 heads per layer, d_model=32, d_head=16, vocab=28, context=32. Position embedding is sinusoidal and frozen; the unembedding is tied to the input embedding. Total parameters ≈ 5,500. Forward pass and backward pass are hand-derived against finite-difference gradient checks (atol 5e-3) — no autograd library, no tensor framework. Training runs AdamW at lr=1e-2 with batch size 16 inside a Web Worker so the UI stays responsive. The synthetic task plants `[A][B] ... [A]` patterns and asks the model to predict B, which is exactly the structure an induction head solves.
The prefix-match score for a head is the average attention weight that the head places on the position right after the previous occurrence of the current token (Olsson 2022 §3.4). A randomly initialized head sits near 1/T ≈ 0.07; an emergent induction head climbs above 0.4 within the first ~600 training steps.