Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Value Iteration and Policy Iteration
Value Iteration and Policy Iteration
6 questions
Difficulty 4-6
View topic
Intermediate
0 / 6
6 intermediate
Adapts to your performance
1 / 6
intermediate (4/10)
state theorem
The Bellman optimality equation for the state-value function under a finite MDP is
V
∗
(
s
)
=
max
a
[
R
(
s
,
a
)
+
γ
∑
s
′
P
(
s
′
∣
s
,
a
)
V
∗
(
s
′
)
]
. What is the key structural feature this captures?
Hide and think first
A.
A Markov assumption on the reward function, requiring rewards to depend only on the current action and not the state
B.
A one-step lookahead that decomposes the optimal value into immediate reward plus discounted optimal future value
C.
A monotonicity condition guaranteeing
V
∗
is non-decreasing in every state across policy iterations
D.
A linear relationship between the value function and the reward function that can be solved in closed form
Submit Answer