Model Timeline
Key Researchers and Ideas
Reference mapping key researchers to their technical contributions: Hinton, LeCun, Bengio, Sutskever, Amodei, Hassabis, Karpathy. Who did what, and what still matters in 2026.
Why This Matters
Understanding who contributed which ideas helps you read the literature more efficiently. When you see "Sutskever et al., 2014" you should immediately know: the LSTM encoder-decoder for sequence-to-sequence learning. Attention for neural machine translation came concurrently from Bahdanau, Cho, Bengio (2014), and self-attention came later from Vaswani et al. (2017). This page is a factual reference. Not biographical worship. Just: who did what, and what still matters.
The Three Pioneers (2024 Nobel Prize in Physics)
Geoffrey Hinton
Geoffrey Hinton
Affiliations: University of Toronto, Google Brain (2013-2023), now independent. Nobel Prize in Physics (2024) jointly with John Hopfield, for foundational work on artificial neural networks.
Key contributions:
-
Backpropagation (Rumelhart, Hinton, Williams, 1986). The learning algorithm that made multilayer neural networks trainable. The chain rule applied to compute gradients through a computation graph. Still the foundation of all deep learning training.
-
Boltzmann Machines (Ackley, Hinton, Sejnowski, 1985). Stochastic recurrent networks that learn probability distributions. Introduced the idea of learning by adjusting weights to match data statistics. Restricted Boltzmann Machines (RBMs) later enabled deep belief networks.
-
Deep Belief Networks (Hinton, Osindero, Teh, 2006). Showed that deep networks could be pretrained layer-by-layer using unsupervised learning (RBM stacking), then fine-tuned. This broke the "deep networks are untrainable" barrier before ReLU and batch normalization existed.
-
AlexNet (Krizhevsky, Sutskever, Hinton, 2012). Won ImageNet 2012 by a large margin using deep convolutional networks with ReLU, dropout, and GPU training. The result that convinced the field that deep learning works.
-
Dropout (Srivastava, Hinton et al., 2014). Regularization by randomly zeroing activations during training. Still used, though largely superseded by batch normalization and other techniques in modern architectures.
-
Capsule Networks (Sabour, Frosst, Hinton, 2017). An attempt to replace max-pooling with "capsules" that preserve spatial relationships. Theoretically motivated but has not displaced standard architectures.
What still matters in 2026: Backpropagation is universal. Dropout remains in use. Hinton's advocacy for AI safety (post-Google resignation in 2023) has influenced policy discussions. His capsule network agenda has not gained traction.
Yann LeCun
Yann LeCun
Affiliations: Meta AI / FAIR (Chief AI Scientist), New York University. Turing Award (2018) jointly with Hinton and Bengio.
Key contributions:
-
Convolutional Neural Networks (LeCun et al., 1989, 1998). Applied backpropagation to networks with shared-weight convolutional filters. LeNet-5 for digit recognition was the first practical deep learning system deployed at scale (check readers at banks). The architecture principle of weight sharing and local connectivity remains fundamental.
-
Energy-Based Models (LeCun et al., 2006). A framework for learning that assigns low energy to correct configurations and high energy to incorrect ones. More general than probabilistic models because it does not require normalized probabilities. Theoretical framework that informs LeCun's current research.
-
JEPA (Joint Embedding Predictive Architecture) (LeCun, 2022 position paper; Assran et al., V-JEPA 2024). LeCun's proposed path to human-level AI. Instead of generating pixels (like autoregressive or diffusion models), JEPA predicts representations in an abstract latent space. The claim: predicting in representation space avoids the computational waste of generating pixel-level detail and better captures abstract structure.
-
Self-supervised learning (DINO, DINOv2, I-JEPA, V-JEPA). Under LeCun's direction, FAIR has produced strong self-supervised vision models that learn useful representations without labeled data.
What still matters in 2026: CNNs remain in production for many vision tasks. The JEPA research program is ongoing but has not yet produced models that outperform autoregressive or diffusion approaches on standard benchmarks. Self-supervised vision (DINO/DINOv2) is widely used.
Yoshua Bengio
Yoshua Bengio
Affiliations: Mila (Quebec AI Institute, founder and scientific director), Universite de Montreal. Turing Award (2018) jointly with Hinton and LeCun.
Key contributions:
-
Neural probabilistic language model (Bengio et al., 2003). One of the first papers to train neural networks on word sequences with learned embeddings. Introduced the idea that words could be represented as dense vectors and that a neural network could model the joint probability of word sequences. This is the ancestor of all modern language models.
-
Deep learning revolution (Bengio, multiple papers 2006-2012). Systematic work on training deep networks: curriculum learning, denoising autoencoders, understanding gradients in deep networks. The theoretical and empirical groundwork that made deep learning practical.
-
Attention mechanism (Bahdanau, Cho, Bengio, 2014). Introduced attention for neural machine translation: instead of compressing the entire source sentence into a fixed vector, let the decoder attend to different parts of the source at each step. This idea is the direct ancestor of the Transformer's self-attention.
-
GFlowNets (Bengio et al., 2021-present). Generative Flow Networks: a framework for sampling diverse, high-reward objects proportional to a reward function. Applications in drug discovery, combinatorial optimization, and causal discovery. An active research program.
What still matters in 2026: The attention mechanism is the foundation of the Transformer. Word embeddings are universal. GFlowNets are an active research area with practical applications.
Scaling and Alignment Era
Ilya Sutskever
Ilya Sutskever
Affiliations: OpenAI (co-founder, Chief Scientist, 2015-2024), Safe Superintelligence Inc. (co-founder, 2024-present). PhD student of Hinton.
Key contributions:
-
AlexNet (Krizhevsky, Sutskever, Hinton, 2012). Co-designed the architecture and training procedure that won ImageNet 2012.
-
Sequence-to-sequence learning (Sutskever, Vinyals, Le, 2014). The encoder-decoder architecture for mapping variable-length sequences to variable-length sequences. Used LSTMs to encode a source sequence into a fixed vector, then decode it into a target sequence. Direct predecessor to the Transformer encoder-decoder.
-
GPT scaling (2018-2023). As OpenAI's Chief Scientist, Sutskever oversaw the scaling of GPT models from GPT-2 (1.5B) through GPT-4. The core thesis: scaling model size and data produces qualitatively new capabilities. This bet defined the field's trajectory.
-
Alignment focus (2023-present). Sutskever's departure from OpenAI and founding of SSI (Safe Superintelligence Inc.) signals a belief that safety research must be the primary focus, not an add-on.
What still matters in 2026: Sequence-to-sequence is the architectural paradigm underlying all encoder-decoder models. The scaling hypothesis he championed remains the dominant paradigm, though the emphasis has shifted from pure scale to data quality and reasoning-focused training.
Dario and Daniela Amodei
Dario and Daniela Amodei
Affiliations: Anthropic (co-founders, CEO and President). Both previously at OpenAI (Dario as VP of Research).
Key contributions:
-
Scaling laws for neural language models (Kaplan, McCandlish, Henighan, ..., Amodei, 2020). While at OpenAI, Dario Amodei was senior author on the paper establishing power-law relationships between model size, data size, compute, and loss. These laws guided compute allocation decisions across the industry.
-
Constitutional AI (Bai et al., 2022). Alignment method where the model critiques and revises its own outputs according to explicit written principles. Reduces dependence on human preference labels and makes alignment criteria transparent.
-
Anthropic's safety-focused research agenda. Under their leadership, Anthropic has published significant work on mechanistic interpretability, honest AI, and responsible scaling policies.
What still matters in 2026: Scaling laws remain the primary planning tool for training runs. Constitutional AI influenced alignment practice across the industry. The organizational model of Anthropic (safety-focused, benefit corporation structure) influenced how other labs think about governance.
Demis Hassabis
Demis Hassabis
Affiliations: Google DeepMind (co-founder, CEO). Nobel Prize in Chemistry (2024) for AlphaFold.
Key contributions:
-
AlphaGo (Silver, Huang, ..., Hassabis, 2016). Defeated the world Go champion using deep RL with Monte Carlo tree search. Demonstrated that deep learning combined with search could solve problems previously considered decades away.
-
AlphaFold (Jumper, Evans, ..., Hassabis, 2021). Solved the protein structure prediction problem. Given an amino acid sequence, AlphaFold predicts the 3D structure with experimental accuracy. AlphaFold 2 was released as open-source and has been used to predict structures for virtually all known proteins. AlphaFold 3 (2024) extended to protein complexes and ligands.
-
Gemini (2023-present). Under Hassabis's leadership of the merged Google DeepMind, the Gemini model family was developed as a natively multimodal system.
What still matters in 2026: AlphaFold transformed structural biology. AlphaGo proved that deep RL plus search works for complex sequential decisions. The DeepMind research philosophy of combining neural networks with search/planning remains influential.
Andrej Karpathy
Andrej Karpathy
Affiliations: OpenAI (founding member, led GPT-2/3 training), Tesla (Director of AI, Autopilot), independent educator (2023-present). PhD student of Fei-Fei Li at Stanford.
Key contributions:
-
Neural network education. Karpathy's blog posts ("The Unreasonable Effectiveness of Recurrent Neural Networks," 2015), Stanford CS231n lectures, and YouTube tutorials introduced a generation of practitioners to deep learning.
-
nanoGPT (2023). A minimal, readable implementation of GPT training in about 600 lines of PyTorch. Designed for education: you can read the entire codebase and understand how a GPT is trained.
-
Tesla Autopilot. Led the computer vision team for Tesla's self-driving system, applying large-scale neural networks to real-time perception.
-
Eureka Labs / AI education (2024-present). Focus on using AI to improve technical education.
What still matters in 2026: Karpathy's educational materials remain among the best introductions to neural networks. nanoGPT is a standard teaching tool. His influence is primarily pedagogical rather than through novel architectures.
Other Researchers Worth Knowing
- Ashish Vaswani, Noam Shazeer et al.: Transformer architecture ("Attention Is All You Need," 2017). The architecture that has been adopted more widely than any other paper in modern AI. Shazeer later co-founded Character.AI.
- Alec Radford: Led GPT-1/2 development at OpenAI. Also CLIP (connecting vision and language via contrastive learning).
- Jason Wei, Denny Zhou et al.: Chain-of-thought prompting (2022). Showed that prompting models to reason step-by-step dramatically improves performance on reasoning tasks.
- Tri Dao: FlashAttention (2022, 2023). IO-aware exact attention algorithm that made long-context training practical. A systems contribution with enormous practical impact.
- Aidan Gomez: Transformer co-author, co-founded Cohere.
- Percy Liang: Stanford HELM benchmarks, CRFM. Systematic evaluation infrastructure for language models.
- Fei-Fei Li: ImageNet (2009). The dataset that enabled the deep learning revolution in computer vision.
Common Confusions
The Turing Award trio did not invent deep learning alone
Hinton, LeCun, and Bengio made foundational contributions, but deep learning is built on work by many others: Schmidhuber (LSTM, 1997), Hochreiter (LSTM co-inventor), Vaswani et al. (Transformer), He et al. (ResNet), and hundreds of others. The Turing Award recognized sustained contributions over decades, not sole invention.
Researcher prestige does not validate current claims
A researcher's past contributions do not make their current predictions correct. LeCun's JEPA thesis may or may not be the right path forward. Hinton's AI safety warnings may or may not be well-calibrated. Evaluate technical claims on their merits, not on the reputation of the claimant.
Summary
- Hinton: backpropagation, Boltzmann machines, dropout, AlexNet, capsule nets
- LeCun: CNNs, energy-based models, JEPA, self-supervised vision (DINO)
- Bengio: neural language models, attention mechanism, GFlowNets
- Sutskever: sequence-to-sequence, GPT scaling, now safety-focused (SSI)
- Dario/Daniela Amodei: scaling laws, Constitutional AI, Anthropic
- Hassabis: AlphaGo, AlphaFold (Nobel), Gemini
- Karpathy: nanoGPT, CS231n, Tesla Autopilot, AI education
- Vaswani/Shazeer: Transformer architecture
- Tri Dao: FlashAttention
Exercises
Problem
For each of the following ideas, name the primary researcher(s) associated with it: (a) attention mechanism for neural machine translation, (b) protein structure prediction via deep learning, (c) training deep networks via layer-wise pretraining, (d) sequence-to-sequence encoder-decoder.
Problem
The Transformer ("Attention Is All You Need," 2017) replaced recurrent architectures with self-attention. Explain which specific limitations of the sequence-to-sequence architecture (Sutskever et al., 2014) and the attention mechanism (Bahdanau et al., 2014) the Transformer addressed, and what it traded off.
References
Foundational:
- Rumelhart, Hinton, Williams, "Learning representations by back-propagating errors" (1986)
- LeCun et al., "Gradient-based learning applied to document recognition" (1998)
- Bengio et al., "A Neural Probabilistic Language Model" (2003)
- Vaswani et al., "Attention Is All You Need" (2017)
Current:
- Jumper et al., "Highly Accurate Protein Structure Prediction with AlphaFold" (2021)
- Bai et al., "Constitutional AI: Harmlessness from AI Feedback" (Anthropic, 2022)
- Dao et al., "FlashAttention: Fast and Memory-Efficient Exact Attention" (2022)
Next Topics
- AI labs landscape: which organizations these researchers built or joined
- Model timeline: chronological reference for major model releases
Last reviewed: April 2026