Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Dropout
Dropout
1 questions
Difficulty 4-4
View topic
Intermediate
0 / 1
1 intermediate
Adapts to your performance
1 / 1
intermediate (4/10)
conceptual
Dropout randomly zeros each neuron's activation with probability
p
during training. At test time, activations are multiplied by
(
1
−
p
)
(or equivalently, training activations are scaled by
1/
(
1
−
p
)
). Why is this scaling necessary?
Hide and think first
A.
It matches expected activation values between training and test time, since each neuron outputs
(
1
−
p
)
times its full activation during training
B.
The scaling normalizes all post-dropout activations to have unit variance, which is required for stable gradient propagation
C.
The scaling factor is purely optional and provides only a marginal computational efficiency improvement during inference
D.
To prevent the test-time network from being overconfident
Submit Answer