Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Universal Approximation Theorem
Universal Approximation Theorem
5 questions
Difficulty 2-7
View topic
Foundation
0 / 5
1 foundation
2 intermediate
2 advanced
Adapts to your performance
1 / 5
foundation (2/10)
conceptual
Why do neural networks need nonlinear activation functions between layers?
Hide and think first
A.
Nonlinearity is required to use gradient descent; linear activations have no gradient
B.
Nonlinearity reduces memory usage compared to purely linear networks
C.
Without nonlinearity, stacking layers collapses to a single linear map:
W
2
(
W
1
x
)
=
(
W
2
W
1
)
x
, losing the benefit of depth
D.
Nonlinearity makes the loss function convex, simplifying optimization
Submit Answer