Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Reward Design and Reward Misspecification
Reward Design and Reward Misspecification
1 questions
Difficulty 6-6
View topic
Intermediate
0 / 1
1 intermediate
Adapts to your performance
1 / 1
intermediate (6/10)
compare
A key challenge for the era of experience is designing rewards. Why is reward design harder than supervised labeling?
Hide and think first
A.
Reward functions must be computed in real time, which is harder than pre-labeling a dataset
B.
Rewards must always be positive, while labels can be any real number, restricting reward design severely
C.
Rewards specify outcomes rather than correct actions, and agents can exploit misaligned reward proxies (reward hacking)
D.
Rewards require more human effort than labels, because every state must be individually rewarded
Submit Answer