TD(0) updates the value estimate using $V(s) \leftarrow V(s) + \alpha [r + \gamma V(s') - V(s)]$. The term $r + \gamma V(s')$ is called the TD target. What does 'bootstrapping' mean in this context?

Temporal Difference Learning

1 selectedDifficulty 4-41 unseenView topic

Saved practice

Answers count toward your profile, review queue, and next-topic suggestions. You can also use the quick practice below.

IntermediateNew

0 answered

1 intermediateAdapts to your performance

Question 1 of 1

120sintermediate (4/10)conceptual

TD(0) updates the value estimate using V (s) \leftarrow V (s) + α [r + γ V (s^{'}) - V (s)] . The term r + γ V (s^{'}) is called the TD target. What does 'bootstrapping' mean in this context?