Skip to main content

Q-Learning

5 selectedDifficulty 5-75 unseenView topic
IntermediateNew
0 answered
4 intermediate1 advancedAdapts to your performance
Question 1 of 5
120sintermediate (5/10)state theorem
Q-learning's update rule is Q(s,a) <- Q(s,a) + alpha [r + gamma max_a' Q(s',a') - Q(s,a)]. Why is the max over next-state actions essential?