1 intermediate2 advancedAdapts to your performance
1 / 3
intermediate (6/10)conceptual
Consider the saddle point at the origin of f(x,y)=x2−y2. Gradient descent initialized at exactly (0,0) gets stuck because the gradient is zero. What property of SGD noise helps escape such saddle points?