Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
/
Data Contamination and Evaluation
Data Contamination and Evaluation
3 questions
Difficulty 4-5
View topic
Intermediate
0 / 3
3 intermediate
Adapts to your performance
1 / 3
intermediate (4/10)
conceptual
Data contamination occurs when test data leaks into model training. Why is it a growing problem for LLM evaluation?
Hide and think first
A.
LLMs train on massive web-scraped datasets that likely include public benchmarks, so test scores may reflect memorization rather than capability
B.
LLMs get worse over time as more contaminated data accumulates, making them degrade silently
C.
The legal risk of training on copyrighted test data is higher than training on general web data
D.
Contaminated models are slower at inference because they have extra memorization overhead
Submit Answer