Model-Based Reinforcement Learning
AdvancedNew
0 answered1 advancedAdapts to your performance
Question 1 of 1
120sadvanced (8/10)state theorem
In model-based RL, the simulation lemma bounds the value error from learning an inexact model. If the model has TV error per step on transitions and on rewards, the value gap satisfies which order?