Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Search
Try:
why does my model overfit
Hoeffding bound
how does attention work
grokking delayed generalization
when to use Adam vs SGD
what is the kernel trick
scaling laws chinchilla
why are transformers so effective