Shareable map · Bookmark this page
ML Theory Roadmap
The whole curriculum on one page. 491 topics across 6 layers, from measure-theoretic foundations through modern deep learning and the research frontier. Tier-1 landmarks are the 126 core pages worth reading first.
Layer 0A · Axioms
48 topicsSets, functions, logic, linear algebra, real analysis, measure-theoretic basics.
Foundations
- ●Common Inequalities
- ●Common Probability Distributions
- ●Compactness and Heine-Borel
- ●Computability Theory
- ●Continuity in R^n
- ●Differentiation in Rn
- ●Eigenvalues and Eigenvectors
- ●Expectation, Variance, Covariance, and Moments
- ●Exponential Function Properties
- ●Inner Product Spaces and Orthogonality
- ●Joint, Marginal, and Conditional Distributions
- ●Matrix Norms
- ●Matrix Operations and Properties
- ●Metric Spaces, Convergence, and Completeness
- ●Positive Semidefinite Matrices
- ●Sets, Functions, and Relations
- ●Singular Value Decomposition
- ●Taylor Expansion
- ●Tensors and Tensor Operations
- ●Vectors, Matrices, and Linear Maps
- Basic Logic and Proof Techniques
- Birthday Paradox
- Cantor's Theorem and Uncountability
- Cardinality and Countability
- Category Theory
- Counting and Combinatorics
- Godel's Incompleteness Theorems
- Integration and Change of Variables
- Inverse and Implicit Function Theorem
- Lambda Calculus
- Moment Generating Functions
- Monty Hall Problem
- Peano Axioms
- Sequences and Series of Functions
- Type Theory
- Zermelo-Fraenkel Set Theory
- Formal Languages and Automata
- Foundational Dependencies
- Vieta Jumping
Algorithms Foundations
Numerical Stability
Calculus Objects
Layer 0B · Infrastructure
22 topicsMeasure theory, functional analysis, convex duality, numerical foundations.
Foundations
Mathematical Infrastructure
Statistical Estimation
Site Meta
Layer 1 · Core Tools
55 topicsConcentration, estimation, information theory, optimization primitives, CLT.
Foundations
Concentration Probability
Statistical Foundations
Statistical Estimation
Numerical Optimization
Optimization Function Classes
Algorithms Foundations
Learning Theory Core
ML Methods
- ●Activation Functions
- ●Cross-Entropy Loss Deep Dive
- ●Data Preprocessing and Feature Engineering
- ●K-Means Clustering
- ●Linear Regression
- ●Logistic Regression
- ●Loss Functions Catalog
- ●Overfitting and Underfitting
- ●Principal Component Analysis
- K-Nearest Neighbors
- Multi-Class and Multi-Label Classification
- Naive Bayes
- Perceptron
Sampling MCMC
Training Techniques
Methodology
- ●Confusion Matrices and Classification Metrics
- ●Confusion Matrix Deep Dive
- ●Model Evaluation Best Practices
- ●Train-Test Split and Data Leakage
- ●Types of Bias in Statistics
- Base Rate Fallacy
- Class Imbalance and Resampling
- Exploratory Data Analysis
- Hardware for ML Practitioners
- ML Project Lifecycle
- Simpson's Paradox
Numerical Stability
Calculus Objects
Layer 2 · Learning Theory
129 topicsERM, VC, Rademacher, PAC, stability, kernels, uniform convergence.
Concentration Probability
Statistical Foundations
Probability
Decision Theory
Numerical Optimization
Optimization Function Classes
Learning Theory Core
ML Methods
- ●Bagging
- ●Feedforward Networks and Backpropagation
- ●Gradient Boosting
- ●Random Forests
- ●Skip Connections and ResNets
- ●Support Vector Machines
- ●Universal Approximation Theorem
- AdaBoost
- Anomaly Detection
- Autoencoders
- Decision Trees and Ensembles
- Dimensionality Reduction Theory
- Ensemble Methods Theory
- Gaussian Mixture Models and EM
- Generalized Additive Models
- Natural Language Processing Foundations
- PageRank Algorithm
- Recommender Systems
- Spectral Clustering
- t-SNE and UMAP
- Time Series Forecasting Basics
- Word Embeddings
- XGBoost
- Boltzmann Machines and Hopfield Networks
- Cubist and Model Trees
- Logspline Density Estimation
- MARS (Multivariate Adaptive Regression Splines)
- NMF (Nonnegative Matrix Factorization)
- Self-Organizing Maps
- Wavelet Smoothing
Sampling MCMC
Training Techniques
Methodology
- Convex Tinkering
- Evaluation Metrics and Properties
- Feature Importance and Interpretability
- Hypothesis Testing for ML
- Meta-Analysis
- P-Hacking and Multiple Testing
- Proper Scoring Rules
- Reproducibility and Experimental Rigor
- Statistical Significance and Multiple Comparisons
- Experiment Tracking and Tooling
- Statistical Paradoxes Collection
NLP Foundations
RL Theory
Reinforcement Learning
EM and Variants
Numerical Stability
Calculus Objects
Bootstrap Resampling
Layer 3 · ML Methods
114 topicsRegression, SVMs, neural nets, optimization, regularization, NTK.
Mathematical Infrastructure
Concentration Probability
Statistical Foundations
Probability
Decision Theory
Numerical Optimization
Optimization Function Classes
Algorithms Foundations
Learning Theory Core
Modern Generalization
ML Methods
- ●Variational Autoencoders
- AlexNet and Deep Learning History
- Contrastive Learning
- Convolutional Neural Networks
- EM Algorithm Variants
- Gaussian Process Regression
- Generative Adversarial Networks
- Graph Neural Networks
- Meta-Learning
- Object Detection and Segmentation
- Optimal Brain Surgery and Pruning Theory
- Recurrent Neural Networks
- Semantic Search and Embeddings
- Speech and Audio ML
- Transfer Learning
- Bayesian Neural Networks
- Energy-Based Models
- Mixture Density Networks
- Normalizing Flows
- Reservoir Computing and Echo State Networks
Sampling MCMC
Training Techniques
Methodology
- ●Causal Inference and the Ladder of Causation
- ●The Bitter Lesson
- Ablation Study Design
- Commons Governance and Institutional Analysis
- Federated Learning
- Leverage Points in Complex Systems
- Synthetic Data Generation
- Anthropic Bias and Observation Selection
- Benchmarking Methodology
- Causal Inference Basics
- Official Statistics and National Surveys
LLM Construction
RL Theory
- ●Policy Gradient Theorem
- Actor-Critic Methods
- GraphSLAM and Factor Graphs
- Markov Games and Self-Play
- No-Regret Learning
- Offline Reinforcement Learning
- Online Learning and Bandits
- Policy Optimization: PPO and TRPO
- Policy Representations
- Self-Play and Multi-Agent RL
- Options and Temporal Abstraction
- Particle Filters
- Reinforcement Learning Environments and Benchmarks
Reinforcement Learning
AI Safety
Applied Math
Layer 4 · Deep Learning
57 topicsTransformers, attention, training dynamics, double descent, scaling.
Statistical Foundations
Modern Generalization
Methodology
LLM Construction
- ●Attention Is All You Need (Paper)
- ●Hallucination Theory
- Attention Mechanism Theory
- Attention Sinks and Retrieval Decay
- Attention Variants and Efficiency
- BERT and the Pretrain-Finetune Paradigm
- Efficient Transformers Survey
- Forgetting Transformer (FoX)
- Induction Heads
- Mixture of Experts
- Residual Stream and Transformer Internals
- RLHF and Alignment
- Scaling Laws
- Sparse Attention and Long Context
- Sparse Autoencoders for Interpretability
- Training Dynamics and Loss Landscapes
- Transformer Architecture
- Attention as Kernel Regression
- Neural Architecture Search
- Positional Encoding
- Tokenization and Information Theory
RL Theory
Beyond Llms
- CLIP and OpenCLIP in Practice
- Diffusion Models
- Equilibrium and Implicit-Layer Models
- Equivariant Deep Learning
- Flow Matching
- JEPA and Joint Embedding
- Mamba and State-Space Models
- Neural ODEs and Continuous-Depth Networks
- Self-Supervised Vision
- Vision Transformer Lineage
- World Models and Planning
- 3D Gaussian Splatting
- Occupancy Networks and Neural Fields
Scientific ML
Number Theory ML
Layer 5 · Frontier
66 topicsRLHF, alignment, interpretability, reasoning, agents, scaling laws.
Modern Generalization
Methodology
LLM Construction
- ●Reinforcement Learning from Human Feedback: Deep Dive
- Chain-of-Thought and Reasoning
- Context Engineering
- Document Intelligence
- DPO vs GRPO vs RL for Reasoning
- Edge and On-Device ML
- Flash Attention
- Fused Kernels
- GPU Compute Model
- Inference Systems Overview
- Inference-Time Scaling Laws
- KV Cache
- KV Cache Optimization
- Latent Reasoning
- Memory Systems for LLMs
- Multi-Token Prediction
- Multimodal RAG
- PaddleOCR and Practical OCR
- Parallel Processing Fundamentals
- Post-Training Overview
- Prefix Caching
- Prompt Engineering and In-Context Learning
- Reasoning Data Curation
- Scaling Compute-Optimal Training
- Speculative Decoding and Quantization
- Structured Output and Constrained Generation
- Test-Time Compute and Search
- Tool-Augmented Reasoning
- AMD Competition Landscape
- ASML and Chip Manufacturing
- Distributed Training Theory
- Donut and OCR-Free Document Understanding
- Model Merging and Weight Averaging
- NVIDIA GPU Architectures
- Plan-then-Generate
- Quantization Theory
- Table Extraction and Structure Recognition
Beyond Llms
AI Safety
How to use this map
- ● Amber dots are tier-1 landmarks. Read these first.
- Each page links down to its prerequisites and up to what builds on it. No concept floats without grounding.
- Use the gap finder to pick a destination and get a BFS-ordered reading list.
- The interactive atlas gives you the same graph with click-to-explore and path tracing.
Planned additions
AI-safety pages I want to add once the primary sources are stable enough to cite without guessing. Each needs the papers read carefully before the page goes up. Expect small batches over the coming months, not a bulk drop.
- Scalable oversight. Bowman et al. 2022, debate and market-based precedents, sandwiching experiments. Scope conditions matter: what the setup can and cannot tell us.
- Deceptive alignment. Hubinger et al. 2019/2021 mesa-optimizer framing. Separate the empirical evidence from the philosophical argument.
- Alignment faking. Greenblatt et al. 2024 (Anthropic). Include the limitations section explicitly.
- DPO. Currently folded into dpo-vs-grpo. Deserves its own page: Rafailov et al. 2023, the implicit-reward view, and the overoptimization story.
- Weak-to-strong generalization. Burns et al. 2023 (OpenAI). What the setup can and cannot tell us about alignment at scale.
- Instrumental convergence. Omohundro, Bostrom framings. Flag explicitly where the philosophical argument outruns the empirical support.
- Jailbreaks. Attack taxonomy, measurement difficulties, why robust alignment is not a solved problem. Needs honest threat-model scoping, not incident anecdotes.
- Superposition. Elhage et al. 2022 toy-models paper, the interference vs capacity trade-off, and the connection to sparse autoencoders.
These are not auto-generated stubs. They land when the mental model is clean.