Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

RL Theory

Multi-Agent Collaboration

Multiple LLM agents working together on complex tasks: debate for improving reasoning, division of labor across specialist agents, structured communication protocols, and when multi-agent outperforms single-agent systems.

AdvancedTier 2Frontier~50 min
0

Why This Matters

Complex tasks often decompose better across multiple agents than within one. A single LLM answering a research question must simultaneously search, read, analyze, and synthesize. A multi-agent system can assign these subtasks to specialists: a search agent finds sources, a reader agent extracts relevant information, an analyst agent identifies patterns, and a writer agent produces the final output.

The hypothesis is that coordination across specialized agents can exceed the capability of a single generalist agent, especially when tasks require diverse skills, long context, or parallel execution.

Mental Model

A multi-agent system consists of:

  1. Agents: Individual LLM instances, possibly with different system prompts, tools, or fine-tuned capabilities
  2. Communication protocol: How agents send messages to each other
  3. Orchestration: Who decides which agent acts next and when the task is complete
  4. Shared state: What information is visible to all agents vs. private to each

The orchestration can be centralized (a manager agent assigns tasks) or decentralized (agents decide autonomously when to act and whom to consult).

Formal Setup and Notation

Definition

Multi-Agent System

A multi-agent LLM system is a tuple (A,M,O,T)(A, \mathcal{M}, \mathcal{O}, T) where:

  • A={a1,,aK}A = \{a_1, \ldots, a_K\} is a set of agents, each with policy πk\pi_k
  • M\mathcal{M} is the message space (structured text or tool calls)
  • O:A×MA\mathcal{O}: A \times \mathcal{M}^* \to A is the orchestration function mapping the current agent and message history to the next agent
  • TT is a termination condition on the message history

Each agent aka_k takes the message history visible to it and produces the next message: mt+1πk(mt)m_{t+1} \sim \pi_k(\cdot | m_{\leq t}).

Definition

Debate Protocol

In debate, two agents a1,a2a_1, a_2 argue for competing answers to a question qq. A judge agent aJa_J evaluates:

  1. a1a_1 proposes answer y1y_1 with argument r1r_1
  2. a2a_2 proposes answer y2y_2 with argument r2r_2
  3. a1a_1 rebuts r2r_2 with counter-argument c1c_1
  4. a2a_2 rebuts r1r_1 with counter-argument c2c_2
  5. Judge aJa_J selects the more convincing answer: y^=aJ(q,y1,r1,c1,y2,r2,c2)\hat{y} = a_J(q, y_1, r_1, c_1, y_2, r_2, c_2)

The debate runs for a fixed number of rounds or until the judge is confident.

Core Definitions

Division of labor assigns different subtasks to different agents. A coding agent writes code, a testing agent runs tests and reports bugs, a review agent checks code quality. Each agent has a narrow system prompt and tool access appropriate to its role.

Structured message passing constrains how agents communicate. Instead of free-form text, agents exchange typed messages: a search agent returns a structured list of sources with relevance scores, not a paragraph of prose. This reduces ambiguity and makes orchestration easier to debug.

Centralized training with decentralized execution (CTDE) trains all agents jointly (or with a shared objective) but deploys them independently. This concept from multi-agent RL and Markov games applies directly: the orchestration system is designed centrally, but each agent acts based only on its own context at inference time.

Main Theorems

Proposition

Debate as a Theoretical Amplification Argument

Statement

Irving, Christiano, and Amodei (2018) argue informally that in an idealized two-player zero-sum debate game with a polynomial-time judge and optimal honest play by at least one debater, the equilibrium selection by the judge should favor the true answer, even when the judge alone could not solve the original question. This is a theoretical argument about a game-theoretic setup, not a proved theorem about transformer-based debaters or real deployments.

The analogy drawn in the paper is to interactive proof systems. A judge with access to two competing provers can in principle verify answers to problems beyond what the judge could decide alone, which in the complexity-theoretic analogy reaches into PSPACE. That analogy motivates the protocol. It does not establish that real LLM debaters will produce honest arguments or that a real judge (human or model) will reliably select the truthful side.

Intuition

If one debater argues for the truth and the other argues for a falsehood, the truthful debater can in principle find a flaw in the opponent's argument (because it is false). At each step, the truthful debater can point to a specific incorrect claim. The judge only needs to evaluate whether this specific claim is correct, which is easier than solving the whole problem. Truth has a structural advantage in the idealized debate game because a false conclusion must contain at least one false sub-claim to expose.

Proof Sketch

Model debate as a sequential game tree. At each node, a debater makes a claim and the opponent can challenge any sub-claim. The judge evaluates leaves (atomic claims) in polynomial time. By backward induction, a false claim at any node can in principle be challenged down to a leaf where the judge detects the falsehood, so under optimal honest play the equilibrium strategy avoids false claims. This argument depends on (a) the honest debater actually finding the flaw, (b) the judge correctly evaluating atomic claims, and (c) the game tree being shallow enough for the protocol to terminate.

Why It Matters

Debate is one candidate approach to scalable oversight: a weaker judge evaluates the output of stronger agents by having two copies of the stronger agent argue opposing sides. It is a motivating framework, not a solved problem. Whether debate helps in practice is an empirical question, and current evidence is mixed.

Failure Mode

The argument assumes optimal honest play, a reliable polynomial-time judge, and a well-behaved decomposition of claims. Real LLM debaters may fail to find flaws, share correlated misconceptions, or collude on persuasive falsehoods. Real judges can be swayed by rhetoric. Bowman et al. (2022) and Michael et al. (2023) report mixed empirical results: debate helps on some tasks and with some judge setups, but does not reliably amplify weak judges in general. The complexity-theoretic analogy to PSPACE does not transfer to real transformer-based agents.

Proof Ideas and Templates Used

The debate argument uses backward induction on an idealized game tree, which is standard in game theory. The key intuition is that under the honest-debater assumption and a polynomial-time judge of atomic claims, a truthful debater can in principle drill down to a verifiable atomic claim that exposes a falsehood in the opponent's argument. This is a theoretical property of the idealized game, not a statement about what real LLM debaters will do.

Key Approaches

Debate and Adversarial Collaboration

Two agents argue for different answers. Useful when:

  • The task has a definite correct answer
  • You want to surface weaknesses in reasoning
  • A judge (human or model) can evaluate arguments

Division of Labor

Specialist agents handle subtasks. Useful when:

  • The task decomposes into independent or loosely coupled subtasks
  • Different subtasks require different tools or capabilities
  • Parallelism would speed up execution

Hierarchical Orchestration

A manager agent plans, delegates, and synthesizes. Worker agents execute specific subtasks and report back. This mirrors human organizational structure and works well when the manager can decompose the task effectively.

Canonical Examples

Example

Multi-agent code generation

A three-agent system for code generation: (1) Architect agent breaks the task into modules and defines interfaces. (2) Coder agent implements each module. (3) Reviewer agent reads the code, runs tests, and reports bugs. The coder and reviewer iterate until tests pass. This mirrors the human code review process and catches bugs that a single agent misses because the reviewer has a fresh perspective on the code.

Common Confusions

Watch Out

More agents does not mean better performance

Adding agents adds communication overhead and coordination complexity. For simple tasks, a single agent with a good prompt outperforms a multi-agent system. Multi-agent systems shine on complex, decomposable tasks where the coordination cost is justified by the gains from specialization and parallelism.

Watch Out

Multi-agent is not the same as multi-turn

A single agent that thinks step-by-step over multiple turns is not a multi-agent system. Multi-agent requires separate agents with different roles, perspectives, or capabilities. The value comes from diversity of viewpoint and specialization, not from additional turns of generation.

Watch Out

Debate does not guarantee correctness

The theoretical debate result assumes optimal play and reliable atomic evaluation. In practice, LLM debaters can be wrong in correlated ways (both believe the same misconception), and judges can be swayed by fluent but incorrect arguments. Debate is a useful tool for surfacing disagreements, not a proof of correctness.

Exercises

ExerciseAdvanced

Problem

Design a multi-agent system with three agents for the task of answering complex research questions. Specify each agent's role, what tools it has access to, and the communication protocol between them. What is the termination condition?

ExerciseCore

Problem

In the debate framework, why does the truthful debater have an advantage over the untruthful one, assuming the judge can evaluate atomic claims?

References

Canonical:

  • Irving, Christiano, Amodei, AI Safety via Debate (2018), arXiv:1805.00899, Sections 2-3 (theoretical setup and honest-debater assumption)
  • Du et al., Improving Factuality and Reasoning in LLMs through Multi-Agent Debate (2023), arXiv:2305.14325

Current:

  • Bowman et al., Measuring Progress on Scalable Oversight for Large Language Models (2022), arXiv:2211.03540 (mixed empirical results on debate as oversight)
  • Michael et al., Debate Helps Supervise Unreliable Experts (2023), arXiv:2311.08702 (empirical followup on debate with unreliable expert debaters)
  • Wu et al., AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (2023), arXiv:2308.08155
  • Hong et al., MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (2023), arXiv:2308.00352

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.