Skip to main content
Theorem
Path
Curriculum
Paths
Labs
Diagnostic
Blog
Search
Sign in
Quiz Hub
/
Residual Stream and Transformer Internals
Residual Stream and Transformer Internals
1 selected
Difficulty 3-3
1 unseen
View topic
Foundation
New
0 answered
1 foundation
Adapts to your performance
Question 1 of 1
120s
foundation (3/10)
compare
Why do transformer-style language models usually prefer LayerNorm or RMSNorm over BatchNorm?
Hide and think first
A.
They remove the need for attention layers entirely
B.
They guarantee that the model will never overfit
C.
They use the test set to estimate better normalization statistics
D.
They normalize within a token's hidden features instead of relying on batch statistics
Show Hint
Submit Answer
I don't know