Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Mixture of Experts
Mixture of Experts
4 questions
Difficulty 4-7
View topic
Intermediate
0 / 4
3 intermediate
1 advanced
Adapts to your performance
1 / 4
intermediate (4/10)
conceptual
Mixture of Experts (MoE) architectures route each input to a subset of specialized expert networks. What is the main efficiency advantage?
Hide and think first
A.
MoE removes the quadratic attention cost in transformers, making them linear in sequence length
B.
MoE stores parameters on disk and loads them only when needed, reducing GPU memory requirements
C.
Only a small fraction of parameters are activated per input (e.g., 2 of 8 experts), so total parameter count scales while per-token compute stays constant
D.
MoE eliminates the need for training data because experts specialize by domain automatically
Submit Answer