Actor-Critic Methods

2 selectedDifficulty 5-62 unseenView topic

IntermediateNew

0 answered

2 intermediateAdapts to your performance

Question 1 of 2

120sintermediate (5/10)conceptual

Actor-critic methods use the advantage function A (s, a) = Q (s, a) - V (s) in the policy gradient \nabla_{θ} J = E [\nabla_{θ} lo g π_{θ} (a ∣ s) \cdot A (s, a)] . Why subtract V (s) from Q (s, a) ?