Prerequisite chain

Prerequisites for GPT Series Evolution

Topics you need before working through GPT Series Evolution. Direct prerequisites are listed first; transitive prerequisites (the chain reachable through them) follow.

Direct prerequisites (8)

Transformer Architecturelayer 4, tier 2
Attention Mechanism Theorylayer 4, tier 2
Scaling Lawslayer 4, tier 1
BERT and the Pretrain-Finetune Paradigmlayer 4, tier 2
Tokenization and Information Theorylayer 4, tier 3
Post-Training Overviewlayer 5, tier 2
Prompt Engineering and In-Context Learninglayer 5, tier 2
RLHF and Alignmentlayer 4, tier 2

Reachable through the chain (382)

These topics are not directly cited as prerequisites but are reached transitively by following the chain upward. Working through the direct prerequisites pulls these in.

Matrix Operations and Propertieslayer 0A, tier 1
Sets, Functions, and Relationslayer 0A, tier 1
Basic Logic and Proof Techniqueslayer 0A, tier 2
Linear Independencelayer 0A, tier 1
Vectors, Matrices, and Linear Mapslayer 0A, tier 1
Softmax and Numerical Stabilitylayer 1, tier 1
Gram Matrices and Kernel Matriceslayer 1, tier 1
Inner Product Spaces and Orthogonalitylayer 0A, tier 1
Eigenvalues and Eigenvectorslayer 0A, tier 1
Matrix Normslayer 0A, tier 1
Distance Metrics Comparedlayer 1, tier 2
Metric Spaces, Convergence, and Completenesslayer 0A, tier 1
Non-Euclidean and Hyperbolic Geometrylayer 1, tier 2
Matrix Multiplication Algorithmslayer 1, tier 2
The Kernel Tricklayer 2, tier 1
Support Vector Machineslayer 2, tier 1
Convex Optimization Basicslayer 1, tier 1
Differentiation in Rⁿlayer 0A, tier 1
Continuity in Rⁿlayer 0A, tier 1
Common Inequalitieslayer 0A, tier 1
Common Probability Distributionslayer 0A, tier 1
Exponential Function Propertieslayer 0A, tier 1
Integration and Change of Variableslayer 0A, tier 2
Measure-Theoretic Probabilitylayer 0B, tier 1
Cardinality and Countabilitylayer 0A, tier 2
Kolmogorov Probability Axiomslayer 0A, tier 1
Random Variableslayer 0A, tier 1
Zermelo-Fraenkel Set Theorylayer 0A, tier 2
Dynamic Programminglayer 0A, tier 1
Graph Algorithms Essentialslayer 0A, tier 2
Greedy Algorithmslayer 0A, tier 2
Inverse and Implicit Function Theoremlayer 0A, tier 2
The Jacobian Matrixlayer 0A, tier 1
Positive Semidefinite Matriceslayer 0A, tier 1
Taylor Expansionlayer 0A, tier 1
The Hessian Matrixlayer 0A, tier 1
Vector Calculus Chain Rulelayer 0A, tier 1
Convex Dualitylayer 2, tier 1
Subgradients and Subdifferentialslayer 1, tier 1
Logistic Regressionlayer 1, tier 1
Maximum Likelihood Estimation: Theory, Information Identity, and Asymptotic Efficiencylayer 0B, tier 1
Central Limit Theoremlayer 0B, tier 1
Law of Large Numberslayer 0B, tier 1
Expectation, Variance, Covariance, and Momentslayer 0A, tier 1
Joint, Marginal, and Conditional Distributionslayer 0A, tier 1
Triangular Distributionlayer 0A, tier 2
Borel-Cantelli Lemmaslayer 0B, tier 1
Modes of Convergence of Random Variableslayer 0B, tier 1
Characteristic Functionslayer 1, tier 1
Moment Generating Functionslayer 0A, tier 2
KL Divergencelayer 1, tier 1
Information Theory Foundationslayer 0B, tier 2
Total Variation Distancelayer 1, tier 1
Method of Momentslayer 0B, tier 2
Radon-Nikodym and Conditional Expectationlayer 0B, tier 1
Data Preprocessing and Feature Engineeringlayer 1, tier 1
Linear Regressionlayer 1, tier 1
The Elements of Statistical Learning (Hastie, Tibshirani, Friedman)layer 0B, tier 1
Naive Bayeslayer 1, tier 2
Perceptronlayer 1, tier 2
Loss Functionslayer 1, tier 2
Hypothesis Classes and Function Spaceslayer 2, tier 1
Empirical Risk Minimizationlayer 2, tier 1
Concentration Inequalitieslayer 1, tier 1
Martingale Theorylayer 0B, tier 2
Skewness, Kurtosis, and Higher Momentslayer 1, tier 1
High-Dimensional Probability (Vershynin)layer 2, tier 1
Cramér-Wold Theoremlayer 1, tier 2
Loss Functions Cataloglayer 1, tier 1
Robust Statistics and M-Estimatorslayer 3, tier 2
Minimax and Saddle Pointslayer 2, tier 2
Winsorizationlayer 1, tier 3
Order Statisticslayer 1, tier 2
Sequences and Series of Functionslayer 0A, tier 2
Understanding Machine Learning (Shalev-Shwartz, Ben-David)layer 1, tier 1
Ridge Regressionlayer 1, tier 1
Shrinkage Estimation and the James-Stein Estimator: Inadmissibility, SURE, and Brown's Characterizationlayer 0B, tier 1
Cramér-Rao Bound: Information Inequality, Achievability, and Sharper Variantslayer 0B, tier 1
Fisher Information: Curvature, KL Geometry, and the Natural Gradientlayer 0B, tier 1
Basu's Theoremlayer 0B, tier 3
Sufficient Statistics and Exponential Familieslayer 0B, tier 2
Minimax Lower Bounds: Le Cam, Fano, Assouad, and the Reduction to Testinglayer 3, tier 1
Empirical Processes and Chaininglayer 3, tier 2
Rademacher Complexitylayer 3, tier 1
VC Dimensionlayer 2, tier 1
Counting and Combinatoricslayer 0A, tier 2
PAC Learning Frameworklayer 1, tier 1
Uniform Convergencelayer 2, tier 1
Adaptive Learning Is Not IIDlayer 3, tier 2
Bernstein Inequalitylayer 2, tier 1
Bennett's Inequalitylayer 2, tier 1
Chernoff Boundslayer 1, tier 1
Hoeffding's Lemmalayer 1, tier 1
Realizability Assumptionlayer 2, tier 1
Slud's Inequalitylayer 2, tier 2
Bias-Complexity Tradeofflayer 2, tier 2
No-Free-Lunch Theoremlayer 2, tier 2
Glivenko-Cantelli Theoremlayer 2, tier 2
McDiarmid's Inequalitylayer 3, tier 1
Sub-Gaussian Random Variableslayer 2, tier 1
Epsilon-Nets and Covering Numberslayer 3, tier 1
Contraction Inequalitylayer 3, tier 2
Sub-Exponential Random Variableslayer 2, tier 1
Chi-Squared Concentrationlayer 2, tier 1
Symmetrization Inequalitylayer 3, tier 1
Asymptotic Statistics: M-Estimators, Delta Method, LANlayer 0B, tier 1
Measure Concentration and Geometric Functional Analysislayer 3, tier 1
Stochastic Processes for MLlayer 2, tier 2
Gauss-Markov Theoremlayer 2, tier 1
The Multivariate Normal Distributionlayer 0B, tier 1
Maximum A Posteriori (MAP) Estimationlayer 0B, tier 1
Bayesian Estimationlayer 0B, tier 2
Bayesian Linear Regressionlayer 2, tier 1
Conjugate Priorslayer 0B, tier 1
Linear Layer: Shapes, Bias, and Memorylayer 2, tier 1
Matrix Calculuslayer 1, tier 1
Feedforward Networks and Backpropagationlayer 2, tier 1
Activation Functionslayer 1, tier 1
Automatic Differentiationlayer 1, tier 1
Decision Trees and Ensembleslayer 2, tier 2
Bias-Variance Tradeofflayer 2, tier 2
Elastic Netlayer 2, tier 2
Lasso Regressionlayer 2, tier 1
Generalized Additive Modelslayer 2, tier 2
MARS (Multivariate Adaptive Regression Splines)layer 2, tier 3
K-Nearest Neighborslayer 1, tier 2
Deep Learning (Goodfellow, Bengio, Courville)layer 0B, tier 1
Gradient Boostinglayer 2, tier 1
Gradient Descent Variantslayer 1, tier 1
AdaBoostlayer 2, tier 2
Cubist and Model Treeslayer 2, tier 3
Tensors and Tensor Operationslayer 0A, tier 1
Pandas and NumPy Fundamentalslayer 4, tier 3
Word Embeddingslayer 2, tier 2
Singular Value Decompositionlayer 0A, tier 1
Information Retrieval Foundationslayer 2, tier 1
Adam Optimizerlayer 2, tier 1
Stochastic Gradient Descent Convergencelayer 2, tier 1
Coordinate Descentlayer 2, tier 2
Mirror Descent and Frank-Wolfelayer 3, tier 2
Online Convex Optimizationlayer 3, tier 2
No-Regret Learninglayer 3, tier 2
Projected Gradient Descentlayer 2, tier 2
Proximal Gradient Methodslayer 2, tier 1
Quasi-Newton Methodslayer 2, tier 1
Newton's Methodlayer 1, tier 1
Line Search Methodslayer 2, tier 2
Secant Methodlayer 1, tier 2
Attention Mechanisms Historylayer 3, tier 2
Recurrent Neural Networkslayer 3, tier 2
Convolutional Neural Networkslayer 3, tier 2
Fast Fourier Transformlayer 1, tier 2
Complex Numbers for Fourierlayer 0A, tier 2
Signals and Systems for MLlayer 1, tier 2
Skip Connections and ResNetslayer 2, tier 1
SVM for RF Classificationlayer 4, tier 3
Macroeconomic Time-Series Forecastinglayer 4, tier 3
Time Series Forecasting Basicslayer 2, tier 2
Time Series Foundationslayer 2, tier 2
Byte-Level Language Modelslayer 4, tier 3
Distributional Semanticslayer 2, tier 2
NLP for Economic Text Analysislayer 4, tier 3
Natural Language Processing Foundationslayer 2, tier 2
RNNs for Signal Sequenceslayer 4, tier 3
Token Prediction and Language Modelinglayer 3, tier 2
Data Contamination and Evaluationlayer 5, tier 2
Hypothesis Testing for MLlayer 2, tier 2
Benford's Lawlayer 1, tier 2
Confusion Matrix: MCC, Kappa, and Cost-Sensitive Evaluationlayer 1, tier 1
Differential Privacylayer 3, tier 2
Federated Learninglayer 3, tier 2
Optimizer Theory: SGD, Adam, and Muonlayer 3, tier 1
Information Geometrylayer 3, tier 3
Whitening and Decorrelationlayer 2, tier 2
Principal Component Analysislayer 1, tier 1
High-Dimensional Covariance Estimationlayer 3, tier 2
Matrix Concentrationlayer 3, tier 1
NMF (Nonnegative Matrix Factorization)layer 2, tier 3
Floating-Point Arithmeticlayer 0A, tier 1
Preconditioned Optimizers: Shampoo, K-FAC, and Natural Gradientlayer 3, tier 2
Conjugate Gradient Methodslayer 2, tier 2
Numerical Linear Algebralayer 1, tier 2
Riemannian Optimization and Manifold Constraintslayer 3, tier 2
Equivariant Deep Learninglayer 4, tier 2
Graph Neural Networkslayer 3, tier 2
Clustering for Gene Expressionlayer 4, tier 3
K-Means Clusteringlayer 1, tier 1
Self-Organizing Mapslayer 2, tier 3
t-SNE and UMAPlayer 2, tier 2
Spectral Clusteringlayer 2, tier 2
PageRank Algorithmlayer 2, tier 2
Attention for Protein Structure: AlphaFold and Successorslayer 4, tier 3
Hyperbolic Embeddings for Graphslayer 2, tier 2
Training Dynamics and Loss Landscapeslayer 4, tier 2
Stability and Optimization Dynamicslayer 2, tier 2
Evaluation Metrics and Propertieslayer 2, tier 2
Neyman-Pearson and Hypothesis Testing Theorylayer 2, tier 2
Reproducibility and Experimental Rigorlayer 2, tier 2
Git and GitLab for ML Researchlayer 4, tier 3
Python for ML Researchlayer 4, tier 3
Weights and Biases for Experiment Trackinglayer 4, tier 3
Survival Analysislayer 3, tier 2
Benchmarking Methodologylayer 3, tier 3
Model Collapse and Data Qualitylayer 5, tier 2
Synthetic Data Generationlayer 3, tier 2
Distributed Training Theorylayer 5, tier 3
Parallel Processing Fundamentalslayer 5, tier 2
Broadcast Joins in Distributed Computelayer 4, tier 3
Dask Parallel Pythonlayer 4, tier 3
Ray Distributed Pythonlayer 4, tier 3
Batch Size and Learning Dynamicslayer 2, tier 2
Kafka Streaming Platformlayer 4, tier 3
Running ML Workloads on GPUslayer 4, tier 3
GPU Compute Modellayer 5, tier 2
ASML and Chip Manufacturinglayer 5, tier 3
Docker and Containers for MLlayer 4, tier 3
Kubernetes for ML Workloadslayer 4, tier 3
Modal: Serverless GPU Platformlayer 4, tier 3
History of Artificial Intelligencelayer 5, tier 2
Ineffable Intelligencelayer 4, tier 2
Reinforcement Learning from Human Feedbacklayer 5, tier 1
Policy Gradient Theoremlayer 3, tier 1
Markov Decision Processeslayer 2, tier 1
Bayesian State Estimationlayer 2, tier 2
Gaussian Processes in Astronomylayer 4, tier 3
Gaussian Processes for Machine Learninglayer 4, tier 3
Kernels and Reproducing Kernel Hilbert Spaceslayer 3, tier 2
Dimensionality Reduction Theorylayer 2, tier 2
Functional Analysis Corelayer 0B, tier 2
Hanson-Wright Inequalitylayer 3, tier 2
Regularization Theorylayer 2, tier 2
Overfitting and Underfittinglayer 2, tier 1
XGBoostlayer 2, tier 2
Gaussian Process Regressionlayer 3, tier 2
Kernel Methods for Moleculeslayer 4, tier 3
Kalman Filterlayer 2, tier 1
No-U-Turn Sampler and Neal's Funnellayer 3, tier 2
Hamiltonian Monte Carlolayer 3, tier 2
Metropolis-Hastings Algorithmlayer 2, tier 1
Markov Chain Monte Carlolayer 2, tier 1
Markov Chains and Steady Statelayer 1, tier 2
Monte Carlo Methodslayer 2, tier 1
Gibbs Samplinglayer 2, tier 1
Griddy Gibbs Samplinglayer 2, tier 3
Variance Reduction Techniqueslayer 2, tier 2
Importance Samplinglayer 2, tier 1
Number Theory and Machine Learninglayer 4, tier 3
Peano Axiomslayer 0A, tier 2
Rejection Samplinglayer 1, tier 2
Squeezed Rejection Samplinglayer 2, tier 3
Burn-in and Convergence Diagnosticslayer 2, tier 2
Coupling Arguments and Mixing Timelayer 3, tier 3
MCMC for Markov Random Fieldslayer 3, tier 3
Perfect Samplinglayer 3, tier 3
Slice Samplinglayer 2, tier 3
Multi-Armed Bandits Theorylayer 2, tier 2
Bayesian Optimization for Hyperparameterslayer 3, tier 2
Online Learning and Banditslayer 3, tier 2
Test-Time Training and Adaptive Inferencelayer 5, tier 2
Continuous Thought Machineslayer 5, tier 3
Neural ODEs and Continuous-Depth Networkslayer 4, tier 3
Classical ODEs: Existence, Stability, and Numerical Methodslayer 1, tier 1
Gradient Flow and Vanishing Gradientslayer 2, tier 1
Equilibrium and Implicit-Layer Modelslayer 4, tier 2
Implicit Differentiationlayer 2, tier 2
Lyapunov-Based Machine Learning for Chaoslayer 4, tier 3
Nonlinear Dynamics and Chaos Fundamentalslayer 4, tier 3
Physics-Informed Neural Networkslayer 4, tier 2
Divergence, Curl, and Line Integralslayer 0A, tier 2
Kolmogorov-Arnold Networks (KANs)layer 4, tier 2
Universal Approximation Theoremlayer 2, tier 1
PDE Fundamentals for Machine Learninglayer 1, tier 2
Stochastic Differential Equationslayer 3, tier 2
Ito's Lemmalayer 3, tier 2
Stochastic Calculus for MLlayer 3, tier 3
Symbolic Regression and Equation Discoverylayer 4, tier 3
Sparse Recovery and Compressed Sensinglayer 4, tier 3
Q-Learninglayer 2, tier 1
Value Iteration and Policy Iterationlayer 2, tier 1
Bellman Equationslayer 2, tier 1
Stochastic Approximation Theorylayer 2, tier 2
Temporal Difference Learninglayer 2, tier 2
Actor-Critic Methodslayer 3, tier 2
Reward Systems and Reinforcement Learning Neurosciencelayer 4, tier 3
Fine-Tuning and Adaptationlayer 3, tier 1
Reinforcement Learning for Synthesis Planninglayer 4, tier 3
Reward Design and Reward Misspecificationlayer 3, tier 1
Reinforcement Learning for Drug Discoverylayer 4, tier 3
AI Labs Landscapelayer 5, tier 2
Model Timelinelayer 5, tier 2
Inference Systems Overviewlayer 5, tier 2
KV Cachelayer 5, tier 2
Attention Is All You Need (Paper)layer 4, tier 1
Attention Variants and Efficiencylayer 4, tier 2
Efficient Transformers Surveylayer 4, tier 2
Speculative Decoding and Quantizationlayer 5, tier 2
Megakernelslayer 5, tier 3
Fused Kernelslayer 5, tier 2
CUDA Programming Fundamentalslayer 4, tier 3
Flash Attentionlayer 5, tier 2
Computer Architecture for MLlayer 2, tier 2
NVIDIA GPU Architectureslayer 5, tier 3
WebGPU for Machine Learninglayer 0B, tier 2
Multi-Token Predictionlayer 5, tier 2
Edge and On-Device MLlayer 5, tier 2
Model Compression and Pruninglayer 3, tier 2
Lazy vs Feature Learninglayer 4, tier 2
Neural Tangent Kernel: Lazy Training, Kernel Equivalence, μP, and the Limits of Widthlayer 4, tier 1
Implicit Bias and Modern Generalizationlayer 4, tier 1
Algorithmic Stabilitylayer 3, tier 1
Cross-Validation Theorylayer 2, tier 2
AIC and BIClayer 2, tier 1
Kolmogorov Complexity and MDLlayer 2, tier 2
Class Imbalance and Resamplinglayer 1, tier 2
Confusion Matrices and Classification Metricslayer 1, tier 1
Multi-Class and Multi-Label Classificationlayer 1, tier 2
Signal Detection Theorylayer 2, tier 2
Feature Importance and Interpretabilitylayer 2, tier 2
Exploratory Data Analysislayer 1, tier 2
ML Project Lifecyclelayer 1, tier 2
Hardware for ML Practitionerslayer 1, tier 2
Train-Test Split and Data Leakagelayer 1, tier 1
Mechanistic Interpretability: Features, Circuits, and Causal Faithfulnesslayer 4, tier 1
Residual Stream and Transformer Internalslayer 4, tier 2
Forgetting Transformer (FoX)layer 4, tier 2
Sparse Attention and Long Contextlayer 4, tier 2
Gemini and Google Modelslayer 5, tier 2
Sparse Autoencoders for Interpretability: TopK, JumpReLU, Matryoshka, and Scalinglayer 4, tier 1
Autoencoderslayer 2, tier 2
Boltzmann Machines and Hopfield Networkslayer 2, tier 3
EM Algorithm Variantslayer 3, tier 2
The EM Algorithmlayer 2, tier 1
Truth Directions and Linear Probeslayer 4, tier 2
Model Evaluation Best Practiceslayer 1, tier 1
Proper Scoring Ruleslayer 2, tier 2
ROC Curve and AUClayer 2, tier 2
Statistical Significance and Multiple Comparisonslayer 2, tier 2
PAC-Bayes Boundslayer 3, tier 1
Sample Complexity Boundslayer 2, tier 1
Information Bottlenecklayer 3, tier 3
Neural Network Optimization Landscapelayer 4, tier 2
Random Matrix Theory Overviewlayer 4, tier 2
SGD as a Stochastic Differential Equationlayer 3, tier 2
Fokker–Planck Equationlayer 3, tier 2
Feynman–Kac Formulalayer 3, tier 2
Mean Field Theorylayer 4, tier 2
Agentic RL and Tool Uselayer 5, tier 2
Offline Reinforcement Learninglayer 3, tier 2
Video World Modelslayer 5, tier 2
World Models and Planninglayer 4, tier 2
The Era of Experiencelayer 4, tier 1
The Bitter Lessonlayer 3, tier 1
Model-Based Reinforcement Learninglayer 3, tier 2
Deep RL for Controllayer 4, tier 3
Diffusion Modelslayer 4, tier 1
Variational Autoencoderslayer 3, tier 1
Autoencoders for Low-Dimensional Dynamical Structureslayer 4, tier 3
Gaussian Mixture Models and EMlayer 2, tier 2
Score Matchinglayer 3, tier 1
Deep Generative Models for Cosmic Structureslayer 4, tier 3
Generative Adversarial Networkslayer 3, tier 2
Normalizing Flowslayer 3, tier 3
Time Reversal of SDEslayer 3, tier 2
CLIP, OpenCLIP, and SigLIP: Contrastive Language-Image Pretraininglayer 4, tier 1
Contrastive Learninglayer 3, tier 2
Vision Transformer Lineage: ViT, DeiT, Swin, MAE, DINOv2, SAMlayer 4, tier 1
CNNs for Medical Imaginglayer 4, tier 3
Object Detection and Segmentationlayer 3, tier 2
Florence and Vision Foundation Modelslayer 5, tier 2
Self-Supervised Visionlayer 4, tier 2
CNNs for Signal Feature Extractionlayer 4, tier 3
Continuous Normalizing Flowslayer 3, tier 3
Adjoint Sensitivity Methodlayer 3, tier 2
Energy-Based Modelslayer 3, tier 3
Energy in Statistics and Machine Learninglayer 2, tier 2
Neural SDEs and the Diffusion Bridgelayer 4, tier 3
Langevin Dynamicslayer 3, tier 2
Probability Flow ODElayer 3, tier 2
Policy Optimization: PPO and TRPOlayer 3, tier 2
DDPG: Deep Deterministic Policy Gradientlayer 3, tier 2
TD3: Twin Delayed Deep Deterministic Policy Gradientlayer 3, tier 2
Test-Time Compute and Searchlayer 5, tier 2