In model-based RL, the simulation lemma bounds the value error from learning an inexact model. If the model has TV error $\epsilon_p$ per step on transitions and $\epsilon_r$ on rewards, the value gap satisfies which order?

Model-Based Reinforcement Learning

1 selectedDifficulty 8-81 unseenView topic

Saved practice

Answers count toward your profile, review queue, and next-topic suggestions. You can also use the quick practice below.

AdvancedNew

0 answered

1 advancedAdapts to your performance

Question 1 of 1

120sadvanced (8/10)state theorem

In model-based RL, the simulation lemma bounds the value error from learning an inexact model. If the model has TV error ϵ_{p} per step on transitions and ϵ_{r} on rewards, the value gap satisfies which order?