Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Decision Theory

Nash Equilibrium

No player can improve by unilateral deviation. Existence via Kakutani fixed-point theorem, computation complexity (PPAD-completeness), refinements, and why Nash equilibria can be inefficient (price of anarchy).

CoreTier 2Stable~45 min

Why This Matters

Nash equilibrium is the central solution concept in non-cooperative game theory. Whenever you train a GAN, analyze multi-agent RL convergence, design an auction, or study adversarial robustness, you are reasoning about Nash equilibria. The concept pins down what "stable behavior" means when multiple optimizers interact.

Three questions matter in practice: Does an equilibrium exist? Can you compute it? Is it any good? The answers are: always (for finite games), probably not efficiently (PPAD-complete), and sometimes not (price of anarchy can be unbounded). This page covers all three.

Formal Definition

Definition

Nash Equilibrium

A strategy profile σ=(σ1,,σn)\sigma^* = (\sigma_1^*, \ldots, \sigma_n^*) is a Nash equilibrium of a normal-form game (N,S,u)(N, S, u) if no player can strictly improve their expected payoff by unilateral deviation:

ui(σi,σi)ui(σi,σi)σiΔ(Si),  iNu_i(\sigma_i^*, \sigma_{-i}^*) \geq u_i(\sigma_i, \sigma_{-i}^*) \quad \forall \sigma_i \in \Delta(S_i), \; \forall i \in N

Notation: σi\sigma_{-i}^* denotes the strategies of all players except ii. A strict Nash equilibrium requires strict inequality for all σiσi\sigma_i \neq \sigma_i^*.

Definition

Epsilon-Nash Equilibrium

A strategy profile σ\sigma is an ε\varepsilon-Nash equilibrium if no player can improve their payoff by more than ε\varepsilon:

ui(σi,σi)ui(σi,σi)+εσiΔ(Si),  iu_i(\sigma_i, \sigma_{-i}) \leq u_i(\sigma_i^*, \sigma_{-i}) + \varepsilon \quad \forall \sigma_i \in \Delta(S_i), \; \forall i

This relaxation is important computationally. While exact Nash equilibria are hard to compute, approximate equilibria may be tractable.

Definition

Support of a Mixed Strategy

The support of a mixed strategy σi\sigma_i is the set of pure strategies played with positive probability: supp(σi)={siSi:σi(si)>0}\text{supp}(\sigma_i) = \{s_i \in S_i : \sigma_i(s_i) > 0\}.

At a Nash equilibrium, every pure strategy in the support of σi\sigma_i^* yields the same expected payoff (player ii is indifferent among them). Any pure strategy outside the support yields equal or lower expected payoff.

Existence Theorem

Theorem

Nash Existence Theorem (1950)

Statement

Every finite game (finite players, finite strategies) has at least one Nash equilibrium in mixed strategies.

Intuition

The proof uses topology. The mixed strategy space is a product of simplices (compact and convex). The best-response correspondence maps this space to itself and satisfies the conditions of Kakutani's fixed-point theorem. A fixed point of the best-response correspondence is exactly a Nash equilibrium.

Proof Sketch

Let Σ=i=1nΔ(Si)\Sigma = \prod_{i=1}^n \Delta(S_i) be the product of all mixed strategy simplices.

Step 1. Σ\Sigma is a nonempty, compact, convex subset of Euclidean space.

Step 2. Define the best-response correspondence BR:ΣΣBR: \Sigma \rightrightarrows \Sigma by:

BR(σ)=i=1nargmaxσiΔ(Si)ui(σi,σi)BR(\sigma) = \prod_{i=1}^n \arg\max_{\sigma_i' \in \Delta(S_i)} u_i(\sigma_i', \sigma_{-i})

Step 3. Verify Kakutani conditions:

  • Nonempty values: The maximum of a continuous function on a compact set is attained.
  • Convex values: If σi\sigma_i' and σi\sigma_i'' are both best responses to σi\sigma_{-i}, then ασi+(1α)σi\alpha\sigma_i' + (1-\alpha)\sigma_i'' is also a best response because uiu_i is linear in σi\sigma_i (multilinearity of expected payoffs).
  • Closed graph: If σ(k)σ\sigma^{(k)} \to \sigma and τ(k)BR(σ(k))\tau^{(k)} \in BR(\sigma^{(k)}) with τ(k)τ\tau^{(k)} \to \tau, continuity of uiu_i implies τBR(σ)\tau \in BR(\sigma).

Step 4. Kakutani's fixed-point theorem gives σBR(σ)\sigma^* \in BR(\sigma^*), meaning every player simultaneously best-responds. This is a Nash equilibrium.

Why It Matters

The theorem guarantees that the equilibrium concept is not vacuous. For any finite game, rational agents have a consistent prediction of play. Nash's contribution was showing that a simple fixed-point argument suffices. The theorem earned Nash the 1994 Nobel Prize in Economics (shared with Harsanyi and Selten).

Failure Mode

The theorem says nothing about uniqueness, stability, or computability. Many games have multiple equilibria with different payoffs, making equilibrium selection a separate (and unsolved) problem. The proof is non-constructive: it guarantees existence without providing an algorithm. Computing Nash equilibria turns out to be PPAD-complete.

Uniqueness Conditions

Not all games have a unique Nash equilibrium. Sufficient conditions for uniqueness include:

  • Strictly dominance-solvable games: IEDS leaves a single profile (e.g., Prisoner's Dilemma).
  • Contraction games: If the best-response map is a contraction in some metric, the fixed point is unique by Banach's theorem.
  • Potential games with a strictly concave potential: The potential has a unique maximizer.
  • Two-player zero-sum games: The equilibrium value is unique, though the equilibrium strategies may not be.

Computational Complexity

Definition

PPAD (Polynomial Parity Arguments on Directed graphs)

A complexity class containing problems whose solutions are guaranteed to exist by parity arguments on directed graphs. The canonical PPAD-complete problem is finding the end of a path in a directed graph where vertices have in-degree and out-degree at most 1, given one endpoint.

Theorem

Nash Equilibrium is PPAD-Complete

Statement

Computing a Nash equilibrium of a two-player game is PPAD-complete (Daskalakis, Goldberg, and Papadimitriou, 2009; Chen and Deng, 2006). This holds even for finding an ε\varepsilon-Nash equilibrium for exponentially small ε\varepsilon.

Intuition

PPAD-completeness means that finding a Nash equilibrium is at least as hard as any problem in PPAD. Since PPAD is believed not to be in P (it would imply unlikely collapses of complexity classes), there is likely no polynomial-time algorithm for computing Nash equilibria of general games. The hardness arises from the topological nature of the existence proof: the equilibrium exists by a fixed-point argument, but finding the fixed point requires exploring an exponentially large space.

Proof Sketch

The proof reduces the problem of finding an approximate Brouwer fixed point (known to be PPAD-complete) to finding a Nash equilibrium. Given a continuous function f:[0,1]n[0,1]nf: [0,1]^n \to [0,1]^n, construct a game whose Nash equilibria correspond to approximate fixed points of ff. The construction uses a graphical game encoding the function evaluation via player interactions.

Why It Matters

This result has practical consequences. No known algorithm computes Nash equilibria of general games in polynomial time. The Lemke-Howson algorithm for two-player games is exponential in the worst case. Support enumeration is exponential. For large games (e.g., poker with 101510^{15} information sets), exact Nash computation is intractable, motivating approximate methods like counterfactual regret minimization (CFR).

Failure Mode

PPAD-completeness applies to the problem of finding any Nash equilibrium. For specific game structures (zero-sum, potential games, congestion games), polynomial-time algorithms exist. Zero-sum games reduce to linear programming. Potential games can be solved by maximizing the potential function.

Equilibrium Refinements

Not all Nash equilibria are equally plausible. Refinements impose additional requirements:

Definition

Subgame Perfect Equilibrium

In extensive-form games (games with sequential moves), a strategy profile is subgame perfect if it induces a Nash equilibrium in every subgame. This eliminates Nash equilibria that rely on non-credible threats. Found by backward induction in finite games of perfect information.

Definition

Trembling-Hand Perfect Equilibrium

A Nash equilibrium σ\sigma^* is trembling-hand perfect if it is the limit of a sequence of totally mixed strategy profiles σ(k)\sigma^{(k)} (every strategy played with positive probability) that are best responses to each other. This eliminates equilibria that rely on the assumption that opponents never make mistakes.

Definition

Correlated Equilibrium

A correlated equilibrium is a probability distribution pp over strategy profiles such that, if a mediator samples sps \sim p and privately tells each player their component sis_i, no player wants to deviate. Formally, for each player ii and each sis_i with p(si)>0p(s_i) > 0:

sip(sisi)ui(si,si)sip(sisi)ui(si,si)siSi\sum_{s_{-i}} p(s_{-i} | s_i) \, u_i(s_i, s_{-i}) \geq \sum_{s_{-i}} p(s_{-i} | s_i) \, u_i(s_i', s_{-i}) \quad \forall s_i' \in S_i

Every Nash equilibrium is a correlated equilibrium (with product distribution), but correlated equilibria can achieve higher total payoffs. A correlated equilibrium can be computed in polynomial time via a linear program, unlike Nash equilibria.

Price of Anarchy

Definition

Price of Anarchy (PoA)

For a game with a social welfare function W(s)=iui(s)W(s) = \sum_i u_i(s), the price of anarchy is:

PoA=maxsSW(s)minσNEW(σ)\text{PoA} = \frac{\max_{s \in S} W(s)}{\min_{\sigma^* \in \text{NE}} W(\sigma^*)}

It measures how much social welfare is lost in the worst Nash equilibrium compared to the social optimum. PoA=1\text{PoA} = 1 means all equilibria are socially optimal. Large PoA means selfish behavior can be very costly.

Example

Braess's Paradox

Consider a traffic network where 4000 drivers travel from A to D. Two routes exist: A-B-D and A-C-D. Adding a shortcut from B to C (with zero travel time) creates a new equilibrium where all drivers use A-B-C-D, and total travel time increases. The price of anarchy exceeds 1.

This paradox shows that adding capacity to a network can make the equilibrium worse. It demonstrates why Nash equilibria can be socially inefficient and why network design must account for selfish routing.

Connections to ML

GAN training dynamics. The GAN objective defines a two-player zero-sum game. The Nash equilibrium satisfies pG=pdatap_G = p_{\text{data}}. In practice, simultaneous gradient descent often fails to converge to this equilibrium. The dynamics can cycle, diverge, or reach limit cycles. Techniques like spectral normalization, two-time-scale updates, and gradient penalties improve convergence toward the equilibrium.

Multi-agent RL. In Markov games, agents simultaneously learn policies via self-play and multi-agent RL. Independent Q-learning does not converge to Nash equilibria in general-sum games. Algorithms like Nash-Q (Hu and Wellman, 2003) compute Nash equilibria at each stage but require solving a game at every step, which is expensive. For zero-sum Markov games, minimax-Q learning converges under standard conditions.

Mechanism design. The revelation principle shows that for any Nash equilibrium of any mechanism, there exists a direct mechanism where truth-telling is a Nash equilibrium with the same outcome. This connects Nash equilibria to the design of incentive-compatible mechanisms.

Common Confusions

Watch Out

Nash equilibrium is not a prediction of what will happen

Nash equilibrium is a consistency requirement: if all players believe others will play the equilibrium, no one wants to deviate. It does not explain how players arrive at the equilibrium. In games with multiple equilibria, the theory provides no selection mechanism. Experimental evidence shows that human players often do not play Nash equilibria, especially in complex games.

Watch Out

Computing Nash equilibrium is hard, but special cases are easy

PPAD-completeness applies to general games. Zero-sum games reduce to LP (solvable in polynomial time). Potential games can be solved by convex optimization. Congestion games always have pure Nash equilibria found by best-response dynamics. Before declaring a problem hard, check if it has special structure.

Watch Out

Mixed Nash equilibria are fragile

In a mixed Nash equilibrium, each player is exactly indifferent between all strategies in their support. Any small change in payoffs can change the equilibrium drastically. This makes mixed equilibria sensitive to modeling assumptions. Pure Nash equilibria (when they exist) are more robust to perturbations.

Exercises

ExerciseCore

Problem

Consider a three-player game where each player chooses {0,1}\{0, 1\}. Player ii receives payoff 1 if the majority choice matches their choice, and 0 otherwise. Find all pure strategy Nash equilibria.

ExerciseCore

Problem

Show that every strict Nash equilibrium is a trembling-hand perfect equilibrium.

ExerciseAdvanced

Problem

Prove that every finite potential game has at least one pure strategy Nash equilibrium.

ExerciseResearch

Problem

The Lemke-Howson algorithm finds a Nash equilibrium of a two-player game by following a path of complementary pivots. Show that the path length can be exponential by describing the Savani-von Stengel construction that produces games where Lemke-Howson takes 2n/22^{n/2} steps on n×nn \times n games.

References

Canonical:

  • Nash, "Non-Cooperative Games" (1951), Annals of Mathematics
  • Osborne & Rubinstein, A Course in Game Theory (1994), Chapters 2-4
  • Fudenberg & Tirole, Game Theory (1991), Chapters 1-2, 11-12

Computational:

  • Daskalakis, Goldberg & Papadimitriou, "The Complexity of Computing a Nash Equilibrium" (2009), SIAM Journal on Computing
  • Chen & Deng, "Settling the Complexity of Two-Player Nash Equilibrium" (2006), FOCS
  • Savani & von Stengel, "Hard-to-Solve Bimatrix Games" (2006), Econometrica

ML Connections:

  • Hu & Wellman, "Nash Q-Learning for General-Sum Stochastic Games" (2003), JMLR
  • Goodfellow et al., "Generative Adversarial Nets" (2014), NeurIPS

Next Topics

Building on Nash equilibrium theory:

  • Mechanism design: designing games with desirable equilibria
  • Minimax theorem: the special structure of equilibria in zero-sum games

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics