Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Decision Theory

Game Theory Foundations

Strategic interaction between rational agents. Normal-form games, dominant strategies, Nash equilibrium existence, mixed strategies, and connections to minimax, mechanism design, and multi-agent RL.

CoreTier 1Stable~50 min

Why This Matters

Any setting where multiple agents optimize simultaneously is a game. GAN training is a two-player game between the generator and discriminator. Multi-agent reinforcement learning agents play games against each other. Auction design, federated learning incentive structures, and adversarial robustness all require game-theoretic reasoning. Without this vocabulary, you cannot state what "equilibrium" means for these systems, let alone analyze convergence or efficiency.

Game theory provides the formal language for strategic interaction: what rational agents should do when their payoffs depend on the actions of others.

Prisoner's DilemmaPlayer 2Player 1CooperateDefectCooperateDefect-1-1,-30,0-3,-2-2,Nash EquilibriumPareto optimalDominant strategy reasoning:P1: Defect beats Cooperateregardless of P2's choiceP2: same logic appliesBoth defect, both worse offthan mutual cooperationThe Nash equilibrium (Defect, Defect) is not Pareto optimal: both players could do better by cooperating

Mental Model

A game specifies (1) who the players are, (2) what actions each player can take, and (3) what payoff each player receives for each combination of actions. The central question: given that every player is rational and knows the game structure, what will they do? The answer, in most cases, is a Nash equilibrium.

Normal-Form Games

Definition

Normal-Form Game

A normal-form (strategic-form) game consists of:

  • A finite set of players N={1,2,,n}N = \{1, 2, \ldots, n\}
  • For each player ii, a finite set of pure strategies SiS_i
  • For each player ii, a payoff function ui:S1×S2××SnRu_i: S_1 \times S_2 \times \cdots \times S_n \to \mathbb{R}

A strategy profile is a tuple s=(s1,,sn)S=S1××Sns = (s_1, \ldots, s_n) \in S = S_1 \times \cdots \times S_n. Player ii receives payoff ui(s)u_i(s) when profile ss is played. In two-player games, payoffs are often written as a bimatrix (A,B)(A, B) where AjkA_{jk} is player 1's payoff and BjkB_{jk} is player 2's payoff when player 1 plays row jj and player 2 plays column kk.

Definition

Dominant Strategy

A strategy sis_i^* is a strictly dominant strategy for player ii if for every alternative strategy sisis_i \neq s_i^* and every opponent profile sis_{-i}:

ui(si,si)>ui(si,si)u_i(s_i^*, s_{-i}) > u_i(s_i, s_{-i})

If the inequality is weak (\geq), the strategy is weakly dominant. A strictly dominant strategy is uniquely optimal regardless of what others do.

Definition

Best Response

Strategy sis_i^* is a best response to the opponent profile sis_{-i} if:

siargmaxsiSiui(si,si)s_i^* \in \arg\max_{s_i \in S_i} u_i(s_i, s_{-i})

The best-response correspondence BRi:SiSiBR_i: S_{-i} \rightrightarrows S_i maps each opponent profile to the set of best responses. This correspondence is central to the existence proof for Nash equilibria.

Definition

Iterated Elimination of Dominated Strategies (IEDS)

Repeatedly remove strictly dominated strategies from every player's strategy set. The order of elimination does not affect the final result for strict dominance. If IEDS leaves a unique strategy profile, the game is dominance solvable.

Canonical Examples

Example

Prisoner's Dilemma

Two suspects choose independently to cooperate (stay silent) or defect (betray). Payoff matrix:

CooperateDefect
Cooperate(1,1)(-1, -1)(3,0)(-3, 0)
Defect(0,3)(0, -3)(2,2)(-2, -2)

Defect is strictly dominant for both players: regardless of the other's choice, defecting yields a higher payoff. The unique Nash equilibrium is (Defect, Defect) with payoffs (2,2)(-2, -2). This is Pareto-dominated by (Cooperate, Cooperate) at (1,1)(-1, -1). Rational self-interest leads to a collectively worse outcome.

Example

Battle of the Sexes

Two players prefer to coordinate but disagree on which activity. Payoff matrix:

OperaFootball
Opera(3,2)(3, 2)(0,0)(0, 0)
Football(0,0)(0, 0)(2,3)(2, 3)

Two pure Nash equilibria: (Opera, Opera) and (Football, Football). There is also a mixed Nash equilibrium where player 1 plays Opera with probability 3/53/5 and player 2 plays Opera with probability 2/52/5. The mixed equilibrium gives expected payoff 6/56/5 to each player, worse than either pure equilibrium. This illustrates that mixed equilibria can be inefficient.

Example

Matching Pennies

A strictly competitive (zero-sum) game with no pure Nash equilibrium:

HeadsTails
Heads(1,1)(1, -1)(1,1)(-1, 1)
Tails(1,1)(-1, 1)(1,1)(1, -1)

For any pure strategy, the opponent has a profitable deviation. The unique Nash equilibrium is mixed: both players play Heads with probability 1/21/2. The expected payoff is 00 for both players. This is the simplest example showing why mixed strategies are necessary.

Mixed Strategies

Definition

Mixed Strategy

A mixed strategy for player ii is a probability distribution σi\sigma_i over the pure strategy set SiS_i. The set of all mixed strategies is the simplex Δ(Si)={σiR0Si:siσi(si)=1}\Delta(S_i) = \{\sigma_i \in \mathbb{R}^{|S_i|}_{\geq 0} : \sum_{s_i} \sigma_i(s_i) = 1\}.

The expected payoff under mixed strategy profile σ=(σ1,,σn)\sigma = (\sigma_1, \ldots, \sigma_n) is:

ui(σ)=sS(j=1nσj(sj))ui(s)u_i(\sigma) = \sum_{s \in S} \left(\prod_{j=1}^n \sigma_j(s_j)\right) u_i(s)

A key property: ui(σ)u_i(\sigma) is multilinear in the mixed strategies. It is linear in σi\sigma_i for fixed σi\sigma_{-i}.

Definition

Nash Equilibrium (Mixed)

A mixed strategy profile σ=(σ1,,σn)\sigma^* = (\sigma_1^*, \ldots, \sigma_n^*) is a Nash equilibrium if no player can improve their expected payoff by unilateral deviation:

ui(σi,σi)ui(σi,σi)σiΔ(Si),  iNu_i(\sigma_i^*, \sigma_{-i}^*) \geq u_i(\sigma_i, \sigma_{-i}^*) \quad \forall \sigma_i \in \Delta(S_i), \; \forall i \in N

Equivalently, every player's mixed strategy is a best response to the others. A pure strategy Nash equilibrium is the special case where all σi\sigma_i^* are degenerate (point mass on a single strategy).

Nash's Existence Theorem

Theorem

Nash Existence Theorem

Statement

Every finite normal-form game (finite players, finite strategy sets) has at least one Nash equilibrium in mixed strategies.

Intuition

Each player's best-response correspondence maps the opponents' mixed strategies (a compact convex set) to a convex subset of the player's own mixed strategy simplex. The combined best-response correspondence maps the product of simplices to itself. Kakutani's fixed-point theorem guarantees a fixed point, which is precisely a Nash equilibrium: a profile where every player is simultaneously best-responding.

Proof Sketch

Define the combined strategy space Σ=Δ(S1)××Δ(Sn)\Sigma = \Delta(S_1) \times \cdots \times \Delta(S_n). This is a nonempty, compact, convex subset of RiSi\mathbb{R}^{\sum_i |S_i|}.

Define the best-response correspondence BR:ΣΣBR: \Sigma \rightrightarrows \Sigma by BR(σ)=BR1(σ1)××BRn(σn)BR(\sigma) = BR_1(\sigma_{-1}) \times \cdots \times BR_n(\sigma_{-n}).

Verify the conditions of Kakutani's fixed-point theorem:

  1. Σ\Sigma is nonempty, compact, and convex.
  2. BR(σ)BR(\sigma) is nonempty (finite games always have best responses).
  3. BR(σ)BR(\sigma) is convex (if σi\sigma_i and σi\sigma_i' are both best responses, any mixture ασi+(1α)σi\alpha \sigma_i + (1-\alpha)\sigma_i' is also a best response because uiu_i is linear in σi\sigma_i).
  4. BRBR has a closed graph (by continuity of uiu_i in all strategies).

By Kakutani's theorem, there exists σBR(σ)\sigma^* \in BR(\sigma^*), which is a Nash equilibrium.

Why It Matters

This theorem guarantees that the equilibrium concept is never vacuous for finite games. Every finite game has a well-defined prediction of rational play. In ML, this means that the Nash equilibrium of the GAN game exists (in mixed strategies over generator and discriminator parameters, though the strategy spaces are infinite in practice). It also justifies the study of multi-agent RL equilibria.

Failure Mode

The theorem guarantees existence but says nothing about uniqueness, computational tractability, or stability. A game may have exponentially many Nash equilibria. Computing a single Nash equilibrium of a general game is PPAD-complete, so no known polynomial-time algorithm exists. The equilibrium may also be unstable: small perturbations can cause the dynamics to diverge. See Nash equilibrium for these complications.

Zero-Sum Games and Minimax

Definition

Zero-Sum Game

A two-player game is zero-sum if u1(s)+u2(s)=0u_1(s) + u_2(s) = 0 for every strategy profile ss. One player's gain is the other's loss. The payoff is fully determined by a single matrix AA: player 1 receives AjkA_{jk} and player 2 receives Ajk-A_{jk}.

In zero-sum games, the Nash equilibrium has a special structure. Player 1 maximizes the minimum guaranteed payoff (maximin), and player 2 minimizes the maximum payoff player 1 can achieve (minimax). Von Neumann's minimax theorem shows that these values are equal:

maxσ1Δ(S1)minσ2Δ(S2)σ1Aσ2=minσ2Δ(S2)maxσ1Δ(S1)σ1Aσ2\max_{\sigma_1 \in \Delta(S_1)} \min_{\sigma_2 \in \Delta(S_2)} \sigma_1^\top A \sigma_2 = \min_{\sigma_2 \in \Delta(S_2)} \max_{\sigma_1 \in \Delta(S_1)} \sigma_1^\top A \sigma_2

This common value is the value of the game. The minimax theorem is the foundational result of game theory and connects to LP duality: computing the minimax strategies is a linear program.

Connections to Machine Learning

GANs. The GAN objective is a two-player zero-sum game. The generator minimizes and the discriminator maximizes E[logD(x)]+E[log(1D(G(z)))]\mathbb{E}[\log D(x)] + \mathbb{E}[\log(1 - D(G(z)))]. The optimal discriminator theorem characterizes the Nash equilibrium: pG=pdatap_G = p_{\text{data}}.

Multi-agent RL. When multiple RL agents interact in a shared environment, the joint optimization is a stochastic game (Markov game). Nash equilibria of the stage game determine equilibrium policies. Convergence of independent learners to Nash equilibrium is not guaranteed in general.

Adversarial robustness. Adversarial attacks frame robustness as a game between a classifier and an adversary who perturbs inputs. The robust optimization formulation minθmaxδϵL(fθ(x+δ),y)\min_\theta \max_{\|\delta\| \leq \epsilon} \mathcal{L}(f_\theta(x + \delta), y) is a minimax problem.

Shapley value. The Shapley value from cooperative game theory assigns a fair attribution of value to each player (feature). SHAP values for model interpretability are exactly Shapley values of a cooperative game defined by the model's predictions.

Common Confusions

Watch Out

Nash equilibrium does not mean optimal outcome

The Prisoner's Dilemma demonstrates that Nash equilibria can be Pareto-dominated. Rational play by all agents does not guarantee a collectively good outcome. The Nash equilibrium is a stability concept (no one wants to deviate), not an efficiency concept. The gap between the Nash equilibrium and the social optimum is measured by the price of anarchy.

Watch Out

Mixed strategies are not randomization for its own sake

A mixed strategy equilibrium means that each player is exactly indifferent between all strategies in the support of their mixture. The mixing probabilities are determined by the requirement that the opponent be indifferent. In Matching Pennies, player 1 mixes 50-50 not because randomness helps them directly, but because this is the only way to prevent the opponent from exploiting a predictable pattern.

Watch Out

Dominant strategy equilibrium is much stronger than Nash

A dominant strategy equilibrium means each player has a strategy that is optimal regardless of what others do. Nash equilibrium only requires each strategy to be optimal given what others are doing. Dominant strategy equilibria are rare and do not require players to know or predict opponent strategies. The Prisoner's Dilemma has a dominant strategy equilibrium. Most games do not.

Exercises

ExerciseCore

Problem

In the Battle of the Sexes game above, compute the mixed strategy Nash equilibrium. Let player 1 play Opera with probability pp and player 2 play Opera with probability qq. Find pp and qq and the expected payoff to each player.

ExerciseCore

Problem

Consider a two-player zero-sum game with payoff matrix A=(3124)A = \begin{pmatrix} 3 & -1 \\ -2 & 4 \end{pmatrix}. Does either player have a dominant strategy? Find the value of the game and the mixed strategy Nash equilibrium.

ExerciseAdvanced

Problem

Prove that in any two-player zero-sum game with payoff matrix AA, the set of Nash equilibrium strategies for player 1 is a convex polytope. Show that if σ1\sigma_1^* and σ1\sigma_1^{**} are both equilibrium strategies for player 1, then any convex combination ασ1+(1α)σ1\alpha \sigma_1^* + (1-\alpha)\sigma_1^{**} for α[0,1]\alpha \in [0, 1] is also an equilibrium strategy for player 1.

ExerciseAdvanced

Problem

The Shapley value ϕi(v)\phi_i(v) for player ii in a cooperative game with characteristic function v:2NRv: 2^N \to \mathbb{R} is:

ϕi(v)=SN{i}S!(nS1)!n![v(S{i})v(S)]\phi_i(v) = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|!(n - |S| - 1)!}{n!} [v(S \cup \{i\}) - v(S)]

Prove that the Shapley value satisfies the efficiency axiom: i=1nϕi(v)=v(N)v()\sum_{i=1}^n \phi_i(v) = v(N) - v(\emptyset).

References

Canonical:

  • Osborne & Rubinstein, A Course in Game Theory (1994), Chapters 1-3
  • Fudenberg & Tirole, Game Theory (1991), Chapters 1-2, 11
  • Myerson, Game Theory: Analysis of Conflict (1991), Chapters 1-3

Current:

  • Shoham & Leyton-Brown, Multiagent Systems (2009), Chapters 3-4, 13
  • Nisan, Roughgarden, Tardos & Vazirani, Algorithmic Game Theory (2007), Chapters 1-2
  • Goodfellow et al., "Generative Adversarial Nets" (2014), Section 4 (game-theoretic analysis)

Next Topics

Natural extensions from game theory foundations:

  • Nash equilibrium: existence proofs, computational complexity, refinements, price of anarchy
  • Mechanism design: inverse game theory, designing rules for self-interested agents
  • Minimax theorem: the fundamental theorem of zero-sum games and its connections to LP duality

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics