Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Gradient Boosting
Gradient Boosting
1 questions
Difficulty 5-5
View topic
Intermediate
0 / 1
1 intermediate
Adapts to your performance
1 / 1
intermediate (5/10)
conceptual
In gradient boosting with squared loss, each new tree is fit to the negative gradient of the loss, which equals the residuals. Why does boosting use the negative gradient rather than directly fitting the residuals for general loss functions?
Hide and think first
A.
Fitting the negative gradient inherently prevents overfitting, while fitting the raw residuals directly causes excessive memorization of training noise
B.
The negative gradient is faster to compute than residuals
C.
Residuals are undefined for classification problems
D.
The negative gradient generalizes residuals to arbitrary differentiable losses, performing steepest descent in function space
Submit Answer