Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Scaling Compute-Optimal Training
Scaling Compute-Optimal Training
2 questions
Difficulty 5-5
View topic
Intermediate
0 / 2
2 intermediate
Adapts to your performance
1 / 2
intermediate (5/10)
conceptual
The Chinchilla scaling law (Hoffmann et al., 2022) found that compute-optimal training allocates a fixed ratio between model parameters
N
and training tokens
D
. What is the approximate relationship?
Hide and think first
A.
D
≈
N
: equal number of tokens and parameters
B.
D
≈
200
N
: train on 200 tokens per parameter
C.
D
≈
N
2
: tokens should scale quadratically with parameters
D.
D
≈
20
N
: train on roughly 20 tokens per parameter
Submit Answer