Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Training Dynamics and Loss Landscapes
Training Dynamics and Loss Landscapes
1 questions
Difficulty 6-6
View topic
Intermediate
0 / 1
1 intermediate
Adapts to your performance
1 / 1
intermediate (6/10)
spot the error
A paper claims: 'We prove that SGD converges to the global minimum of any neural network loss function, since the loss is differentiable and SGD follows the negative gradient direction.' What is wrong with this argument?
Hide and think first
A.
The argument is actually correct for sufficiently wide neural networks, as recent overparameterization theory shows SGD finds global minima in this regime
B.
Neural network losses are non-convex, so following the negative gradient only guarantees convergence to a local minimum or saddle point, not the global minimum
C.
The loss function of neural networks is not actually differentiable due to ReLU activations having a non-differentiable point at zero
D.
SGD uses stochastic gradient estimates with inherent noise, so it cannot converge to any minimum at all without a decreasing learning rate schedule
Show Hint
Submit Answer