Beta. Content is under active construction and has not been peer-reviewed. Report errors on
GitHub
.
Disclaimer
Theorem
Path
Curriculum
Paths
Demos
Diagnostic
Search
Quiz Hub
593 questions across 217 topics. Self-study and multiple choice. Three difficulty tiers.
core
advanced
research
Find your level
20-question diagnostic across all topics. Get a recommended learning path.
~10 min
ai safety
Calibration and Uncertainty Quantification
T2
5
-
7
2q
Catastrophic Forgetting
T2
4
-
5
3q
Constitutional AI
T2
5
-
6
3q
Data Contamination and Evaluation
T2
4
-
5
3q
Differential Privacy
T2
5
-
6
3q
Ethics and Fairness in ML
T2
4
-
5
3q
Mechanistic Interpretability
T2
5
-
9
2q
algorithms foundations
Dynamic Programming
T1
7
1q
Fast Fourier Transform
T2
3
-
5
3q
Greedy Algorithms
T2
4
3q
applied math
Cryptographic Hash Functions
T2
4
-
5
4q
beyond llms
Diffusion Models
T2
4
-
6
4q
bootstrap resampling
Bootstrap Methods
T1
5
1q
calculus objects
Automatic Differentiation
T1
6
1q
Matrix Calculus
T1
1
-
9
6q
The Hessian Matrix
T1
4
-
7
4q
The Jacobian Matrix
T1
4
-
9
4q
concentration probability
Chernoff Bounds
T1
4
-
7
2q
Concentration Inequalities
T1
2
-
10
21q
Epsilon-Nets and Covering Numbers
T1
6
-
7
3q
High-Dimensional Probability (Vershynin)
T1
10
1q
Matrix Concentration
T1
7
1q
McDiarmid's Inequality
T1
5
3q
Sub-Exponential Random Variables
T1
5
1q
Sub-Gaussian Random Variables
T1
4
-
9
13q
Symmetrization Inequality
T1
7
-
8
3q
decision theory
Auction Theory
T2
4
-
6
3q
Bounded Rationality
T1
3
-
7
3q
Decision Theory Foundations
T2
4
-
5
3q
Expected Utility Theory
T2
6
1q
Game Theory Foundations
T1
5
-
8
2q
Nash Equilibrium
T2
5
-
8
2q
em and variants
The EM Algorithm
T1
6
1q
foundations
Basic Logic and Proof Techniques
T2
3
3q
Benford's Law
T2
2
-
4
3q
Birthday Paradox
T2
3
-
4
3q
Cardinality and Countability
T2
3
-
7
3q
Common Inequalities
T1
2
-
3
3q
Common Probability Distributions
T1
1
-
5
4q
Compactness and Heine-Borel
T1
3
-
5
4q
Computability Theory
T1
4
-
10
4q
Continuity in R^n
T1
2
-
4
3q
Counting and Combinatorics
T2
2
-
4
3q
Differentiation in Rn
T1
3
-
5
3q
Eigenvalues and Eigenvectors
T1
2
-
4
5q
Expectation, Variance, Covariance, and Moments
T1
1
4q
Exponential Function Properties
T1
2
-
3
3q
Gram Matrices and Kernel Matrices
T1
3
-
4
3q
Inner Product Spaces and Orthogonality
T1
2
-
5
6q
Joint, Marginal, and Conditional Distributions
T1
1
-
6
6q
KL Divergence
T1
3
-
8
6q
Markov Chains and Steady State
T2
2
1q
Matrix Norms
T1
2
-
4
3q
Matrix Operations and Properties
T1
3
1q
Metric Spaces, Convergence, and Completeness
T1
4
-
5
3q
Numerical Stability and Conditioning
T1
3
4q
Positive Semidefinite Matrices
T1
3
-
6
4q
Sets, Functions, and Relations
T1
2
-
3
3q
Singular Value Decomposition
T1
2
-
5
4q
Skewness, Kurtosis, and Higher Moments
T1
3
-
5
3q
Taylor Expansion
T1
4
2q
Tensors and Tensor Operations
T1
1
-
3
4q
Vectors, Matrices, and Linear Maps
T1
1
-
3
10q
learning theory core
Algorithmic Stability
T1
7
1q
Empirical Risk Minimization
T1
1
-
9
10q
Hypothesis Classes and Function Spaces
T1
1
1q
PAC Learning Framework
T1
3
-
10
7q
Rademacher Complexity
T1
6
-
9
8q
Sample Complexity Bounds
T1
3
-
6
3q
Uniform Convergence
T1
5
-
7
3q
VC Dimension
T1
3
-
10
16q
llm construction
Attention as Kernel Regression
T3
6
1q
Attention Mechanism Theory
T2
4
-
9
10q
Attention Mechanisms History
T2
2
-
4
3q
Attention Sinks and Retrieval Decay
T2
5
-
6
3q
Attention Variants and Efficiency
T2
5
-
6
3q
BERT and the Pretrain-Finetune Paradigm
T2
2
-
4
3q
Bits, Nats, Perplexity, and BPB
T2
2
-
4
3q
Chain-of-Thought and Reasoning
T2
4
-
6
3q
Context Engineering
T2
4
-
5
3q
Decoding Strategies
T2
3
-
5
3q
DPO vs GRPO vs RL for Reasoning
T2
5
-
7
3q
Efficient Transformers Survey
T2
5
-
7
3q
Fine-Tuning and Adaptation
T1
5
-
6
3q
Flash Attention
T2
5
-
9
4q
GPU Compute Model
T2
4
-
6
3q
Hallucination Theory
T1
4
-
5
3q
Induction Heads
T2
5
-
9
2q
Knowledge Distillation
T2
4
-
6
4q
Mixture of Experts
T2
4
-
7
4q
Model Compression and Pruning
T2
6
-
7
2q
Optimizer Theory: SGD, Adam, and Muon
T1
7
3q
Reinforcement Learning from Human Feedback: Deep Dive
T1
5
-
7
3q
RLHF and Alignment
T2
4
-
6
3q
Scaling Compute-Optimal Training
T2
5
2q
Scaling Laws
T2
5
-
6
3q
Training Dynamics and Loss Landscapes
T2
6
1q
Transformer Architecture
T2
4
-
7
9q
mathematical infrastructure
Convex Duality
T1
5
-
6
3q
Information Theory Foundations
T2
1
-
7
9q
Martingale Theory
T2
4
-
9
3q
Measure-Theoretic Probability
T1
3
-
5
2q
methodology
Ablation Study Design
T2
3
-
5
3q
Base Rate Fallacy
T2
3
-
5
3q
Causal Inference and the Ladder of Causation
T1
6
-
7
3q
Causal Inference Basics
T3
3
1q
Class Imbalance and Resampling
T2
2
-
4
3q
Confusion Matrices and Classification Metrics
T1
2
-
3
3q
Exploratory Data Analysis
T2
2
-
3
3q
Federated Learning
T2
3
-
6
3q
Model Evaluation Best Practices
T1
2
-
4
3q
Statistical Significance and Multiple Comparisons
T2
1
1q
The Bitter Lesson
T1
4
-
7
3q
The Era of Experience
T1
5
-
6
3q
Train-Test Split and Data Leakage
T1
2
2q
Types of Bias in Statistics
T1
3
-
4
3q
ml methods
Activation Functions
T1
1
-
5
7q
AdaBoost
T2
3
-
5
3q
Anomaly Detection
T2
3
-
4
3q
Autoencoders
T2
2
-
4
3q
Bagging
T1
5
-
6
3q
Bayesian Neural Networks
T3
5
-
8
2q
Contrastive Learning
T2
4
-
8
5q
Convolutional Neural Networks
T2
3
1q
Cross-Entropy Loss Deep Dive
T1
3
-
8
4q
Data Preprocessing and Feature Engineering
T1
2
-
3
3q
Decision Trees and Ensembles
T2
4
1q
Dimensionality Reduction Theory
T2
2
-
4
3q
Ensemble Methods Theory
T2
4
1q
Feedforward Networks and Backpropagation
T1
1
-
9
15q
Gaussian Mixture Models and EM
T2
3
-
6
3q
Gaussian Process Regression
T2
5
-
7
2q
Generalized Additive Models
T2
3
-
5
3q
Generative Adversarial Networks
T2
6
1q
Gradient Boosting
T1
5
1q
Graph Neural Networks
T2
3
-
6
4q
K-Means Clustering
T1
2
1q
Linear Regression
T1
4
-
6
7q
Logistic Regression
T1
3
-
5
2q
Loss Functions Catalog
T1
1
1q
Naive Bayes
T2
6
1q
Overfitting and Underfitting
T1
1
-
2
2q
Principal Component Analysis
T1
2
-
5
3q
Random Forests
T1
5
1q
Recurrent Neural Networks
T2
2
-
7
2q
Skip Connections and ResNets
T1
5
1q
Spectral Clustering
T2
8
1q
Support Vector Machines
T1
4
2q
Transfer Learning
T2
3
-
5
3q
Universal Approximation Theorem
T1
2
-
7
5q
Variational Autoencoders
T1
4
-
6
4q
modern generalization
Benign Overfitting
T2
5
-
7
3q
Double Descent
T2
6
-
7
2q
Grokking
T2
4
-
6
3q
Implicit Bias and Modern Generalization
T1
7
-
10
3q
Information Bottleneck
T3
7
1q
Lazy vs Feature Learning
T2
7
-
9
2q
Neural Network Optimization Landscape
T2
6
-
7
3q
Neural Tangent Kernel
T2
7
-
8
3q
PAC-Bayes Bounds
T2
6
-
10
3q
Representation Learning Theory
T2
7
1q
numerical optimization
Bayesian Optimization for Hyperparameters
T2
4
-
5
3q
Coordinate Descent
T2
3
-
5
3q
Newton's Method
T1
4
-
5
5q
Proximal Gradient Methods
T1
5
-
7
3q
Quasi-Newton Methods
T1
5
-
6
2q
numerical stability
Floating-Point Arithmetic
T1
2
-
3
3q
Log-Probability Computation
T1
2
-
3
3q
Softmax and Numerical Stability
T1
3
1q
optimization function classes
Bias-Variance Tradeoff
T2
1
-
5
6q
Convex Optimization Basics
T1
1
-
6
12q
Cross-Validation Theory
T2
2
1q
Gradient Descent Variants
T1
2
-
6
5q
Gradient Flow and Vanishing Gradients
T1
5
-
7
4q
Kernels and Reproducing Kernel Hilbert Spaces
T2
3
-
4
2q
Regularization Theory
T2
2
1q
Stochastic Approximation Theory
T2
4
-
7
3q
Stochastic Gradient Descent Convergence
T1
1
-
7
13q
probability
Fat Tails and Heavy-Tailed Distributions
T1
4
-
6
3q
regression methods
AIC and BIC
T1
6
-
7
3q
Elastic Net
T2
3
-
5
3q
Gauss-Markov Theorem
T1
4
-
6
3q
Lasso Regression
T1
2
-
4
2q
Ridge Regression
T1
2
-
4
2q
reinforcement learning
Bellman Equations
T1
2
-
7
12q
Reward Design and Reward Misspecification
T1
6
1q
rl theory
Agentic RL and Tool Use
T2
5
-
6
3q
Bayesian State Estimation
T2
4
-
6
5q
Exploration vs Exploitation
T2
2
-
6
6q
Kalman Filter
T1
5
-
6
3q
Markov Decision Processes
T1
4
-
7
3q
Policy Gradient Theorem
T1
6
-
7
3q
Temporal Difference Learning
T2
4
1q
Value Iteration and Policy Iteration
T1
4
-
6
6q
sampling mcmc
Gibbs Sampling
T1
5
-
6
3q
Importance Sampling
T1
3
-
7
4q
Metropolis-Hastings Algorithm
T1
5
-
6
5q
statistical estimation
Asymptotic Statistics
T3
6
1q
Bayesian Estimation
T2
4
-
9
4q
Central Limit Theorem
T1
2
-
4
2q
Cramér-Rao Bound
T1
4
-
7
5q
Fisher Information
T1
4
-
9
10q
Law of Large Numbers
T1
1
1q
Maximum Likelihood Estimation
T1
3
-
9
13q
Shrinkage Estimation and the James-Stein Estimator
T1
6
-
7
3q
Sufficient Statistics and Exponential Families
T2
6
2q
statistical foundations
Fano Inequality
T2
7
-
9
2q
Kernel Two-Sample Tests
T2
6
-
7
3q
Minimax Lower Bounds
T2
7
-
9
2q
training techniques
Adam Optimizer
T1
3
-
10
6q
Batch Normalization
T1
4
1q
Batch Size and Learning Dynamics
T2
2
-
7
4q
Data Augmentation Theory
T2
3
-
5
3q
Dropout
T1
4
1q
Learning Rate Scheduling
T1
3
-
6
3q
Regularization in Practice
T1
4
2q
Weight Initialization
T1
4
1q