Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

RLHF and Alignment

3 questionsDifficulty 4-6View topic
Intermediate
0 / 3
3 intermediateAdapts to your performance
1 / 3
intermediate (4/10)compare
Standard RLHF (as used for InstructGPT and early ChatGPT) has three stages. Which sequence is correct?