Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Policy Gradient Theorem

3 questionsDifficulty 6-7View topic
Intermediate
0 / 3
1 intermediate2 advancedAdapts to your performance
1 / 3
intermediate (6/10)conceptual
In the REINFORCE algorithm, the policy gradient is . Why is subtracting a baseline from useful even though it does not change the expected gradient?