Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Knowledge Distillation
Knowledge Distillation
4 questions
Difficulty 4-6
View topic
Intermediate
0 / 4
4 intermediate
Adapts to your performance
1 / 4
intermediate (4/10)
conceptual
Knowledge distillation (Hinton et al. 2015) trains a small 'student' model to match a large 'teacher' model's outputs. Why use soft teacher probabilities instead of just hard labels?
Hide and think first
A.
Hard labels aren't available for the training set used in distillation, so soft labels are used out of necessity
B.
Soft probabilities encode 'dark knowledge' — relative similarities between classes — that hard labels don't convey
C.
Soft labels are equivalent to hard labels when the teacher is well-trained, so the distinction is academic
D.
Soft probabilities are cheaper to compute than hard labels, saving training time for the student
Submit Answer