Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Adam Optimizer
Adam Optimizer
6 questions
Difficulty 3-10
View topic
Foundation
0 / 6
1 foundation
2 intermediate
3 advanced
Adapts to your performance
1 / 6
foundation (3/10)
conceptual
In Adam's update rule,
θ
t
+
1
=
θ
t
−
η
⋅
m
^
t
/
(
v
^
t
+
ϵ
)
, what role does
ϵ
(typically
1
0
−
8
) play?
Hide and think first
A.
It prevents division by zero when
v
^
t
≈
0
for a coordinate with near-zero gradient history
B.
It controls the effective learning rate: larger
ϵ
means smaller effective step size for all parameters
C.
It adds L2 regularization to the optimizer
D.
It ensures convergence by bounding the maximum step size below
η
/
ϵ
Submit Answer