Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Markov Decision Processes
Markov Decision Processes
3 questions
Difficulty 4-7
View topic
Intermediate
0 / 3
2 intermediate
1 advanced
Adapts to your performance
1 / 3
intermediate (4/10)
compute
For a discount factor
γ
=
0.9
, an agent receives reward
r
=
1
at every time step forever. What is the value (discounted return) from the starting state?
Hide and think first
A.
10, from the geometric series
∑
t
=
0
∞
0.
9
t
=
1/
(
1
−
0.9
)
=
10
. With constant reward 1 and discount 0.9, the infinite sum converges to
1/0.1
B.
∞
, because the agent receives reward 1 at every step forever, and the sum of infinitely many positive terms must diverge to infinity
C.
0.9, because the expected discounted value is
γ
⋅
r
=
0.9
⋅
1
from the one-step Bellman update, representing the present value of receiving reward 1 next step
D.
9, from
∑
t
=
1
∞
0.
9
t
=
0.9/
(
1
−
0.9
)
=
9
, because the first reward at
t
=
0
is not discounted and should be counted separately from the series
Show Hint
Submit Answer