Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
RLHF and Alignment
RLHF and Alignment
3 questions
Difficulty 4-6
View topic
Intermediate
0 / 3
3 intermediate
Adapts to your performance
1 / 3
intermediate (4/10)
compare
Standard RLHF (as used for InstructGPT and early ChatGPT) has three stages. Which sequence is correct?
Hide and think first
A.
Supervised fine-tuning on demonstrations; reward model training on preferences; RL fine-tuning (PPO) against the reward model
B.
Unsupervised embedding alignment; contrastive pretraining; reward optimization
C.
RL pre-training from scratch; supervised fine-tuning; reward model filtering
D.
Reward modeling on base outputs; supervised fine-tuning; RL against reward
Submit Answer