Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Knowledge Distillation

4 questionsDifficulty 4-6View topic
Intermediate
0 / 4
4 intermediateAdapts to your performance
1 / 4
intermediate (4/10)conceptual
Knowledge distillation (Hinton et al. 2015) trains a small 'student' model to match a large 'teacher' model's outputs. Why use soft teacher probabilities instead of just hard labels?