Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Model Timeline

Claude Model Family

Anthropic's Claude series from Claude 1 through the Claude 4.x family (4, 4.5, 4.6, 4.7), covering Constitutional AI, extended thinking, computer use, long context, and safety-focused design.

CoreTier 2Frontier~40 min
0

Why This Matters

Claude is one of the three leading frontier model families (alongside GPT and Gemini). Its primary differentiator is the alignment methodology: Constitutional AI replaces human preference labels with written principles, making the alignment process more transparent and auditable. Understanding Claude's design choices clarifies the trade-offs between safety, capability, and openness in frontier model development.

Model Timeline

Claude 1 (March 2023)

Anthropic's first public model. Trained with RLHF and an early version of Constitutional AI. Competitive with GPT-3.5 on general tasks. Context window: 9K tokens, later extended to 100K. The 100K context variant was notable at the time because most competitors offered 4K-8K.

Claude 2 (July 2023)

Improved reasoning, coding, and math performance over Claude 1. Context window: 100K tokens. Better calibration on refusals (fewer false positives on harmless requests). Claude 2.1 (November 2023) reduced hallucination rates and improved accuracy on long documents.

Claude 3 Family (March 2024)

Three tiers targeting different use cases:

  • Haiku: smallest, fastest, cheapest. Designed for high-volume, latency-sensitive tasks. Competitive with GPT-3.5 Turbo at lower cost.
  • Sonnet: mid-tier. Balanced speed and capability. The default choice for most applications.
  • Opus: largest and most capable. Strongest reasoning and analysis. Competitive with GPT-4 Turbo on most benchmarks.

All three models supported 200K token context windows, vision (image input), and tool use.

Claude 3.5 Sonnet (June 2024)

A significant jump: Claude 3.5 Sonnet matched or exceeded Claude 3 Opus on most benchmarks while being faster and cheaper (Sonnet-tier pricing). Strong performance on coding tasks (SWE-bench), reasoning, and instruction following. Introduced computer use capability in beta: the model could interact with desktop applications through screenshots and mouse/keyboard control.

Claude 3.7 Sonnet (February 2025)

Introduced extended thinking: a mode in which the model produces a visible scratchpad of reasoning tokens before answering. Developers control the thinking-budget via the API. Extended thinking improves performance on math, coding, and long-horizon planning at the cost of higher token usage and latency. Claude 3.7 also pushed agentic coding, with stronger performance on SWE-bench Verified than any prior Claude.

Claude 4 Family (May 2025)

Claude 4 Sonnet and Claude 4 Opus. Continued improvements in reasoning, coding, and agentic task completion. Extended tool use capabilities. Sonnet 4 became the default for most applications, with Opus 4 reserved for longer-horizon, harder reasoning tasks. Architecture details remain undisclosed.

Claude 4.5 Family (late 2025)

Claude 4.5 Sonnet and Claude 4.5 Opus. The 4.5 series focused on agentic reliability: longer tool-use chains without drift, better instruction-following inside subagents, and improvements on computer-use benchmarks (OSWorld, WebArena). Claude Haiku 4.5 (October 2025) landed at roughly the capability of Sonnet 4 at Haiku-tier cost, making small-agent pipelines much cheaper.

Claude 4.6 Family (early 2026)

Claude 4.6 Sonnet and Claude 4.6 Opus. Incremental capability gains on coding, math, and long-context retrieval. Extended thinking became the default for harder prompts rather than an opt-in mode, with the model deciding when to allocate thinking tokens. Computer use graduated from beta to generally available.

Claude 4.7 Opus (April 2026)

Targeted at frontier-lab workflows: long autonomous coding tasks, multi-hour agent runs, and deeper mechanistic analysis. The 4.7 release did not ship a Sonnet or Haiku sibling simultaneously; Anthropic ship separate minor versions per tier when the capability gains are meaningful.

Version cadence

As of April 2026 the release pattern is:

  • Major family (Claude 4) introduces the base architecture and training recipe.
  • Minor versions (4.5, 4.6, 4.7) refresh the same family with post-training improvements, new agentic behaviors, and occasional capability jumps, without the cost of a full pretraining run.
  • Haiku lags the Opus/Sonnet lineup by 1-2 minor versions but tends to compress the prior minor version's capability into a cheaper model.

This cadence is not stated by Anthropic as a policy; it is the observed pattern across 2024-2026 releases. Treat it as empirical, not guaranteed.

Constitutional AI

Definition

Constitutional AI (CAI)

An alignment method where a set of explicit written principles (the "constitution") guides model behavior. Instead of training a reward model solely on human preference labels (as in standard RLHF), CAI uses the model itself to critique and revise its outputs according to the constitution, then trains a preference model on these AI-generated comparisons. This is sometimes called RLAIF (Reinforcement Learning from AI Feedback).

Proposition

CAI Training Signal Construction

Statement

Constitutional AI constructs training signal in two phases. Phase 1 (supervised): generate response, critique it against each constitutional principle, revise, and fine-tune on the revision. Phase 2 (RL): generate response pairs, use the model to choose which better follows the constitution, train a preference model on these AI-labeled pairs, then optimize the policy with RL (PPO) against this preference model. The resulting model satisfies the constitutional constraints more consistently than RLHF with equivalent human annotation budget.

Intuition

Instead of asking thousands of human annotators "which response is better?", you write down your criteria explicitly and have the model itself apply those criteria. This is cheaper, more scalable, and makes the alignment target auditable: anyone can read the constitution and know what the model was trained to optimize.

Proof Sketch

The empirical result (Bai et al., 2022) shows that RLAIF-trained models match RLHF-trained models on helpfulness while improving on harmlessness metrics. The constitutional critique step provides a training signal that correlates with human judgments on harmlessness at roughly the same level as inter-annotator agreement.

Why It Matters

Standard RLHF encodes alignment criteria implicitly in annotator preferences. This makes it hard to audit, modify, or debug. If the model behaves unexpectedly, you cannot point to a specific principle that was violated. CAI makes the criteria explicit. If you want the model to behave differently, you change the constitution.

Failure Mode

CAI assumes the model is capable enough to perform meaningful self-critique. For weaker models, the self-critique is low quality and the training signal is noisy. The constitution must also be well-written: vague or contradictory principles produce inconsistent behavior. CAI does not eliminate the need for human judgment; it shifts it from labeling individual examples to writing good principles.

Architecture

Anthropic has not published detailed architecture specifications for Claude models. Based on available information:

  • Dense transformer. Claude models are believed to use dense decoder-only transformer architectures (not mixture-of-experts, as of current public knowledge).
  • Long context. 200K token context window across the Claude 3 family. The mechanism for supporting long context has not been detailed publicly.
  • Parameter counts. Not disclosed for any Claude model.

This lack of architectural disclosure is a deliberate choice. Anthropic has argued that detailed capability disclosures can accelerate proliferation of dangerous capabilities. This contrasts with Meta (full architecture and weights for Llama) and DeepSeek (detailed technical reports).

Key Technical Capabilities

Tool use. Claude can call external tools (functions, APIs) defined by the developer. The model decides when to call a tool, constructs the arguments, and incorporates the result into its response.

Computer use. Starting with Claude 3.5 Sonnet (October 2024, beta) and generally available from Claude 4.6, the model can interact with desktop environments via screenshots and simulated mouse/keyboard input. Agentic workflows where the model operates software directly (OSWorld, WebArena, real SaaS apps) became production-grade in the 4.5/4.6 series.

Extended thinking. From Claude 3.7 Sonnet onward, the model can be asked to produce a visible reasoning scratchpad before answering. Developers set a thinking_budget that caps the number of reasoning tokens. Long thinking trades latency and cost for better accuracy on math, planning, and code. From Claude 4.6, the model adaptively allocates thinking tokens based on prompt difficulty.

Vision. Claude 3 and later models accept image inputs alongside text. They can analyze charts, read text in images, and reason about visual content.

Comparison with GPT and Gemini

  • Reasoning. Claude models through the 4.x family trade top spots on reasoning benchmarks with OpenAI's o-series and Gemini 2.5. On agentic coding (SWE-bench Verified, Terminal-Bench) Claude 4.5 Opus and later have been among the strongest publicly available models.
  • Long context. Claude's 200K token context window has held stable through the 4.x family. GPT-5 is comparable or slightly larger; Gemini 2.5 Pro advertises 1M+ tokens, though effective-use at that length varies by task.
  • Safety. Anthropic emphasizes safety research (interpretability, alignment, evaluations for dangerous capabilities) more publicly than competitors. Whether this translates to measurably safer models in practice is debated.
  • Multimodality. Claude supports text and image input. Gemini supports text, image, audio, and video natively. GPT-4o supports text, image, and audio.
  • Open weights. Claude has no open-weight releases. GPT has no open-weight releases for frontier models. Meta and DeepSeek provide open weights.

Pricing Tiers

The three-tier model (Haiku/Sonnet/Opus) reflects a deliberate design for different workloads:

  • Haiku: high throughput, low latency, low cost. For classification, extraction, and simple generation.
  • Sonnet: the general-purpose default. Suitable for most applications.
  • Opus: maximum capability. For complex reasoning, analysis, and tasks where accuracy matters more than speed or cost.

This tiering pattern is common across providers: OpenAI (GPT-4o-mini / GPT-4o / o1), Google (Flash / Pro / Ultra).

Common Confusions

Watch Out

Constitutional AI does not mean the model follows rules perfectly

CAI trains the model to prefer outputs consistent with the constitution, but it does not guarantee compliance. The model can still produce outputs that violate constitutional principles, especially on edge cases or adversarial inputs. The constitution provides a training signal, not a hard constraint.

Watch Out

Dense does not mean small

Claude uses a dense architecture (all parameters active for every token), which means its inference cost scales with total parameter count. A dense model with N parameters costs more per token than an MoE model with N total parameters but N/10 active. However, dense models are simpler to train and deploy, with no expert routing overhead.

Exercises

ExerciseCore

Problem

Explain the difference between RLHF and RLAIF (Constitutional AI). What is the source of the preference signal in each case, and what practical advantage does RLAIF offer?

ExerciseAdvanced

Problem

Anthropic offers Haiku, Sonnet, and Opus at different price points. Suppose Haiku costs cc per million tokens, Sonnet costs 4c4c, and Opus costs 15c15c. You have a task where Haiku achieves 80% accuracy, Sonnet achieves 92% accuracy, and Opus achieves 96% accuracy. If you can run Haiku 3 times and take a majority vote, what accuracy do you expect, and how does the cost compare to a single Sonnet call?

References

Canonical:

  • Bai et al., "Constitutional AI: Harmlessness from AI Feedback" (Anthropic, 2022)
  • Bai et al., "Training a Helpful and Harmless Assistant with RLHF" (Anthropic, 2022)

Current:

  • Anthropic, "The Claude 3 Model Family" (2024), technical report
  • Anthropic, "Claude's Character" (2024)
  • Anthropic, "Claude 3.7 Sonnet and extended thinking" (Feb 2025), release notes
  • Anthropic, model cards for Claude 4, 4.5, 4.6, 4.7 on anthropic.com/news (2025-2026)
  • Anthropic, "Developing a computer-use model" (Oct 2024 beta; GA with Claude 4.6)
  • Ouyang et al., "Training language models to follow instructions with human feedback" (InstructGPT, 2022). The RLHF recipe that preceded constitutional AI.
  • Rafailov et al., "Direct Preference Optimization" (NeurIPS 2023). Alternative to PPO-style RLHF relevant to assistant fine-tuning.

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics