Multi-Agent Collaboration

Sneiderman, Robby

RL Theory

Multi-Agent Collaboration

Multiple LLM agents working together on complex tasks: debate for improving reasoning, division of labor across specialist agents, structured communication protocols, and when multi-agent outperforms single-agent systems.

AdvancedTier 2FrontierFrontier watch~50 min

Prerequisites

Markov Decision Processes Policy Gradient Theorem

Prereq Map

Learning position

Read this page in the graph.

rl-theory | layer 4 | tier 2. This page has 2 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Take the diagnostic

No published continuation is declared yet, so the diagnostic is the clean next route.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Complex tasks often decompose better across multiple agents than within one. A single LLM answering a research question must simultaneously search, read, analyze, and synthesize. A multi-agent system can assign these subtasks to specialists: a search agent finds sources, a reader agent extracts relevant information, an analyst agent identifies patterns, and a writer agent produces the final output.

The hypothesis is that coordination across specialized agents can exceed the capability of a single generalist agent, especially when tasks require diverse skills, long context, or parallel execution.

Mental Model

A multi-agent system consists of:

Agents: Individual LLM instances, possibly with different system prompts, tools, or fine-tuned capabilities
Communication protocol: How agents send messages to each other
Orchestration: Who decides which agent acts next and when the task is complete
Shared state: What information is visible to all agents vs. private to each

The orchestration can be centralized (a manager agent assigns tasks) or decentralized (agents decide autonomously when to act and whom to consult).

Current Checkpoint

Recent multi-agent work is useful, but the gains are not automatic. The strongest systems usually make collaboration explicit: the agents have different roles, typed messages, shared artifacts, a clear judge or termination rule, and instrumentation that shows where coordination helped or failed.

The weak version is just several chat transcripts talking to each other. That adds latency, cost, and correlated mistakes. The stronger version looks closer to a workflow:

a planner writes a task graph,
workers produce structured evidence or artifacts,
a reviewer checks claims against sources, tests, or rubrics,
a controller decides whether another round is worth the cost.

For learning systems, this suggests a practical pattern: separate the tutor, diagnostician, curriculum planner, and verifier roles internally, but show the learner one coherent product. Users should feel guided, not dropped into an agent committee.

Build It This Way by Default

Only use multi-agent orchestration when each agent owns a different artifact or verification duty. If two agents see the same context and emit the same kind of prose, collapse them into one better prompt or one deterministic workflow step.

Formal Setup and Notation

Definition

Multi-Agent System

A multi-agent LLM system is a tuple $(A, \mathcal{M}, \mathcal{O}, T)$ where:

$A = \{a_1, \ldots, a_K\}$ is a set of agents, each with policy $\pi_k$
$\mathcal{M}$ is the message space (structured text or tool calls)
$\mathcal{O}: A \times \mathcal{M}^* \to A$ is the orchestration function mapping the current agent and message history to the next agent
$T$ is a termination condition on the message history

Each agent $a_k$ takes the message history visible to it and produces the next message: $m_{t+1} \sim \pi_k(\cdot | m_{\leq t})$ .

Definition

Debate Protocol

In debate, two agents $a_1, a_2$ argue for competing answers to a question $q$ . A judge agent $a_J$ evaluates:

$a_1$ proposes answer $y_1$ with argument $r_1$
$a_2$ proposes answer $y_2$ with argument $r_2$
$a_1$ rebuts $r_2$ with counter-argument $c_1$
$a_2$ rebuts $r_1$ with counter-argument $c_2$
Judge $a_J$ selects the more convincing answer: $\hat{y} = a_J(q, y_1, r_1, c_1, y_2, r_2, c_2)$

The debate runs for a fixed number of rounds or until the judge is confident.

Core Definitions

Division of labor assigns different subtasks to different agents. A coding agent writes code, a testing agent runs tests and reports bugs, a review agent checks code quality. Each agent has a narrow system prompt and tool access appropriate to its role.

Structured message passing constrains how agents communicate. Instead of free-form text, agents exchange typed messages: a search agent returns a structured list of sources with relevance scores, not a paragraph of prose. This reduces ambiguity and makes orchestration easier to debug.

Centralized training with decentralized execution (CTDE) trains all agents jointly (or with a shared objective) but deploys them independently. This concept from multi-agent RL and Markov games applies directly: the orchestration system is designed centrally, but each agent acts based only on its own context at inference time.

Main Theorems

Proposition

Debate as a Theoretical Amplification Argument

Statement

Irving, Christiano, and Amodei (2018) argue informally that in an idealized two-player zero-sum debate game with a polynomial-time judge and optimal honest play by at least one debater, the equilibrium selection by the judge should favor the true answer, even when the judge alone could not solve the original question. This is a theoretical argument about a game-theoretic setup, not a proved theorem about transformer-based debaters or real deployments.

The analogy drawn in the paper is to interactive proof systems. A judge with access to two competing provers can in principle verify answers to problems beyond what the judge could decide alone, which in the complexity-theoretic analogy reaches into PSPACE. That analogy motivates the protocol. It does not establish that real LLM debaters will produce honest arguments or that a real judge (human or model) will reliably select the truthful side.

Intuition

If one debater argues for the truth and the other argues for a falsehood, the truthful debater can in principle find a flaw in the opponent's argument (because it is false). At each step, the truthful debater can point to a specific incorrect claim. The judge only needs to evaluate whether this specific claim is correct, which is easier than solving the whole problem. Truth has a structural advantage in the idealized debate game because a false conclusion must contain at least one false sub-claim to expose.

Proof Sketch

Model debate as a sequential game tree. At each node, a debater makes a claim and the opponent can challenge any sub-claim. The judge evaluates leaves (atomic claims) in polynomial time. By backward induction, a false claim at any node can in principle be challenged down to a leaf where the judge detects the falsehood, so under optimal honest play the equilibrium strategy avoids false claims. This argument depends on (a) the honest debater actually finding the flaw, (b) the judge correctly evaluating atomic claims, and (c) the game tree being shallow enough for the protocol to terminate.

Why It Matters

Debate is one candidate approach to scalable oversight: a weaker judge evaluates the output of stronger agents by having two copies of the stronger agent argue opposing sides. It is a motivating framework, not a solved problem. Whether debate helps in practice is an empirical question, and current evidence is mixed.

Failure Mode

The argument assumes optimal honest play, a reliable polynomial-time judge, and a well-behaved decomposition of claims. Real LLM debaters may fail to find flaws, share correlated misconceptions, or collude on persuasive falsehoods. Real judges can be swayed by rhetoric. Bowman et al. (2022) and Michael et al. (2023) report mixed empirical results: debate helps on some tasks and with some judge setups, but does not reliably amplify weak judges in general. The complexity-theoretic analogy to PSPACE does not transfer to real transformer-based agents.

report a correction →

Proof Ideas and Templates Used

The debate argument uses backward induction on an idealized game tree, which is standard in game theory. The key intuition is that under the honest-debater assumption and a polynomial-time judge of atomic claims, a truthful debater can in principle drill down to a verifiable atomic claim that exposes a falsehood in the opponent's argument. This is a theoretical property of the idealized game, not a statement about what real LLM debaters will do.

Key Approaches

Debate and Adversarial Collaboration

Two agents argue for different answers. Useful when:

The task has a definite correct answer
You want to surface weaknesses in reasoning
A judge (human or model) can evaluate arguments

Division of Labor

Specialist agents handle subtasks. Useful when:

The task decomposes into independent or loosely coupled subtasks
Different subtasks require different tools or capabilities
Parallelism would speed up execution

Hierarchical Orchestration

A manager agent plans, delegates, and synthesizes. Worker agents execute specific subtasks and report back. This mirrors human organizational structure and works well when the manager can decompose the task effectively.

Canonical Examples

Example

Multi-agent code generation

A three-agent system for code generation: (1) Architect agent breaks the task into modules and defines interfaces. (2) Coder agent implements each module. (3) Reviewer agent reads the code, runs tests, and reports bugs. The coder and reviewer iterate until tests pass. This mirrors the human code review process and catches bugs that a single agent misses because the reviewer has a fresh perspective on the code.

Common Confusions

Watch Out

More agents does not mean better performance

Adding agents adds communication overhead and coordination complexity. For simple tasks, a single agent with a good prompt outperforms a multi-agent system. Multi-agent systems shine on complex, decomposable tasks where the coordination cost is justified by the gains from specialization and parallelism.

Watch Out

Multi-agent is not the same as multi-turn

A single agent that thinks step-by-step over multiple turns is not a multi-agent system. Multi-agent requires separate agents with different roles, perspectives, or capabilities. The value comes from diversity of viewpoint and specialization, not from additional turns of generation.

Watch Out

Debate does not guarantee correctness

The theoretical debate result assumes optimal play and reliable atomic evaluation. In practice, LLM debaters can be wrong in correlated ways (both believe the same misconception), and judges can be swayed by fluent but incorrect arguments. Debate is a useful tool for surfacing disagreements, not a proof of correctness.

Exercises

ExerciseAdvanced

Problem

Design a multi-agent system with three agents for the task of answering complex research questions. Specify each agent's role, what tools it has access to, and the communication protocol between them. What is the termination condition?

ExerciseCore

Problem

In the debate framework, why does the truthful debater have an advantage over the untruthful one, assuming the judge can evaluate atomic claims?

References

Canonical:

Irving, Christiano, Amodei, AI Safety via Debate (2018), arXiv:1805.00899, Sections 2-3 (theoretical setup and honest-debater assumption)
Du et al., Improving Factuality and Reasoning in LLMs through Multi-Agent Debate (2023), arXiv:2305.14325

Current:

Bowman et al., Measuring Progress on Scalable Oversight for Large Language Models (2022), arXiv:2211.03540 (mixed empirical results on debate as oversight)
Michael et al., Debate Helps Supervise Unreliable Experts (2023), arXiv:2311.08702 (empirical followup on debate with unreliable expert debaters)
Wu et al., AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (2023), arXiv:2308.08155
Hong et al., MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (2023), arXiv:2308.00352
Tran et al., Multi-Agent Collaboration Mechanisms: A Survey of LLMs (2025), arXiv:2501.06322
Lifshitz et al., Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers (2025), arXiv:2502.20379

Last reviewed: May 25, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Markov Decision Processeslayer 2 · tier 1
Policy Gradient Theoremlayer 3 · tier 1

Derived topics

0

No published topic currently declares this as a prerequisite.