Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Temporal Difference Learning
Temporal Difference Learning
1 questions
Difficulty 4-4
View topic
Intermediate
0 / 1
1 intermediate
Adapts to your performance
1 / 1
intermediate (4/10)
conceptual
TD(0) updates the value estimate using
V
(
s
)
←
V
(
s
)
+
α
[
r
+
γ
V
(
s
′
)
−
V
(
s
)]
. The term
r
+
γ
V
(
s
′
)
is called the TD target. What does 'bootstrapping' mean in this context?
Hide and think first
A.
The algorithm resamples observed episodes with replacement to create synthetic training data, analogous to the statistical bootstrap
B.
The algorithm initializes its value estimates from a pretrained model rather than starting from random or zero initialization
C.
The update uses the current estimate
V
(
s
′
)
as a stand-in for the true value, creating a self-referential update target
D.
The algorithm computes a multi-step lookahead over several future transitions before committing to a single parameter update
Submit Answer