Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Optimizer Theory: SGD, Adam, and Muon

3 questionsDifficulty 7-7View topic
Advanced
0 / 3
3 advancedAdapts to your performance
1 / 3
advanced (7/10)spot the error
Reddi et al. (2018) showed that Adam can diverge on simple convex problems where SGD converges. What is the root cause of Adam's divergence?