Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Training Dynamics and Loss Landscapes

1 questionsDifficulty 6-6View topic
Intermediate
0 / 1
1 intermediateAdapts to your performance
1 / 1
intermediate (6/10)spot the error
A paper claims: 'We prove that SGD converges to the global minimum of any neural network loss function, since the loss is differentiable and SGD follows the negative gradient direction.' What is wrong with this argument?