Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Learning Rate Scheduling
Learning Rate Scheduling
3 questions
Difficulty 3-6
View topic
Foundation
0 / 3
1 foundation
2 intermediate
Adapts to your performance
1 / 3
foundation (3/10)
conceptual
Why does stochastic gradient descent (SGD) typically use a decaying learning rate schedule?
Hide and think first
A.
Smaller learning rates prevent the model from learning anything new
B.
The loss function itself changes shape during the course of training, necessitating progressively smaller optimization steps to navigate the evolving landscape
C.
Large steps explore early; smaller steps are needed later so gradient noise does not cause oscillation around the minimum
D.
Learning rate decay is entirely optional and has no measurable effect on the convergence guarantees or final solution quality of SGD
Submit Answer