Skip to main content

Model-Based Reinforcement Learning

1 selectedDifficulty 8-81 unseenView topic
AdvancedNew
0 answered
1 advancedAdapts to your performance
Question 1 of 1
120sadvanced (8/10)state theorem
In model-based RL, the simulation lemma bounds the value error from learning an inexact model. If the model has TV error per step on transitions and on rewards, the value gap satisfies which order?