Von Neumann Minimax Theorem

Sneiderman, Robby

Decision Theory

Von Neumann Minimax Theorem

In finite two-person zero-sum games, max-min equals min-max. The fundamental theorem of game theory, its proof via linear programming duality, and connections to adversarial robustness, GANs, and online learning.

CoreTier 2StableReference~40 min

Prerequisites

Game Theory Convex Duality Nash Equilibrium

Start 8-question practice · 3 available Prereq Map

Learning position

Read this page in the graph.

decision-theory | layer 2 | tier 2. This page has 3 direct prerequisites and 1 published dependent.

Open Atlas Prerequisites Leads to

What next

Adversarial Machine Learning

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Pure-strategy saddle point: max of row-mins equals min of col-maxs at the same cell

The minimax theorem (von Neumann, 1928) is the foundational result of game theory. It states that in two-person zero-sum games with mixed strategies, the order of optimization does not matter: the maximizer's guarantee equals the minimizer's guarantee. This is not obvious. In general, $\max\min \leq \min\max$ (weak duality always holds), but equality (strong duality) requires structure.

The theorem appears throughout ML under different names. The GAN objective is a minimax problem. Adversarial robustness is formulated as $\min_\theta \max_\delta \mathcal{L}$ . Online learning regret bounds rely on minimax arguments. LP duality, which underlies much of optimization theory, is a special case. Understanding minimax duality is prerequisite to understanding any of these.

Setup: Zero-Sum Games

Definition

Two-Person Zero-Sum Game $(A, - A)$

A two-person zero-sum game is defined by a payoff matrix $A \in \mathbb{R}^{m \times n}$ . Player 1 (the row player) chooses a row $i \in \{1, \ldots, m\}$ . Player 2 (the column player) chooses a column $j \in \{1, \ldots, n\}$ . Player 1 receives $A_{ij}$ and player 2 receives $-A_{ij}$ .

Player 1 wants to maximize $A_{ij}$ . Player 2 wants to minimize it.

Definition

Mixed Strategies in Zero-Sum Games

Player 1 chooses a mixed strategy $x \in \Delta_m = \{x \in \mathbb{R}^m_{\geq 0} : \mathbf{1}^\top x = 1\}$ . Player 2 chooses $y \in \Delta_n$ . The expected payoff to player 1 is:

$x^\top A y = \sum_{i=1}^m \sum_{j=1}^n x_i A_{ij} y_j$

Player 1 maximizes this bilinear form. Player 2 minimizes it.

Definition

Maximin and Minimax Values

The maximin value is the best guarantee player 1 can secure:

$\underline{v} = \max_{x \in \Delta_m} \min_{y \in \Delta_n} x^\top A y$

The minimax value is the least that player 2 can hold player 1 to:

$\overline{v} = \min_{y \in \Delta_n} \max_{x \in \Delta_m} x^\top A y$

Weak duality always holds: $\underline{v} \leq \overline{v}$ . The minimax theorem states $\underline{v} = \overline{v}$ .

The Minimax Theorem

Theorem

Von Neumann Minimax Theorem (1928)

Statement

For any matrix $A \in \mathbb{R}^{m \times n}$ :

$\max_{x \in \Delta_m} \min_{y \in \Delta_n} x^\top A y = \min_{y \in \Delta_n} \max_{x \in \Delta_m} x^\top A y$

The common value $v^* = \underline{v} = \overline{v}$ is the value of the game. Optimal strategies $(x^*, y^*)$ achieving this value exist and form a Nash equilibrium of the zero-sum game.

Intuition

Consider the function $f(x, y) = x^\top A y$ . It is linear (hence convex) in $x$ for fixed $y$ , and linear (hence concave) in $y$ for fixed $x$ . The strategy spaces $\Delta_m$ and $\Delta_n$ are compact and convex. These three ingredients (convex-concave function, compact convex domains) are exactly what is needed for strong minimax duality. The key point: mixing strategies (randomizing) creates convexity, which closes the minimax gap.

Without mixed strategies, the maximin value in pure strategies can be strictly less than the minimax value. Matching Pennies has pure maximin value $-1$ and pure minimax value $1$ , but the mixed value is $0$ .

Proof Sketch

Via LP duality. Player 1's maximin problem is:

$\max_{x \in \Delta_m, \, v} \; v \quad \text{subject to} \quad x^\top A e_j \geq v \; \forall j$

This is a linear program. Write it in standard form:

$\max \; v \quad \text{s.t.} \quad A^\top x \geq v \mathbf{1}, \; \mathbf{1}^\top x = 1, \; x \geq 0$

The dual LP is player 2's minimax problem:

$\min \; w \quad \text{s.t.} \quad Ay \leq w \mathbf{1}, \; \mathbf{1}^\top y = 1, \; y \geq 0$

which is equivalent to $\min_{y \in \Delta_n} \max_{x \in \Delta_m} x^\top A y$ .

By LP strong duality (which holds since both primal and dual are feasible, as the simplices are nonempty), the optimal values are equal: $v^* = w^*$ . Therefore $\max\min = \min\max$ .

Why It Matters

The minimax theorem tells you that zero-sum games are "solved" in a strong sense. The value is unique, and optimal strategies can be computed in polynomial time (by solving an LP). This is in sharp contrast to general Nash equilibria, which are PPAD-complete to compute.

In ML, the theorem justifies the GAN objective: the generator can achieve the data distribution at the Nash equilibrium of the minimax game. It also justifies adversarial training: the robust model exists because the minimax equality holds (under appropriate convexity conditions on the loss landscape).

Failure Mode

The theorem requires mixed strategies. In pure strategies, $\max\min < \min\max$ in general (e.g., Matching Pennies). The theorem also requires finite strategy spaces (or the appropriate generalization to compact convex sets with a convex-concave objective). For general functions on general domains, the minimax equality can fail.

report a correction →

Sion's Generalization

Theorem

Sion Minimax Theorem (1958)

Statement

Let $X$ be a compact convex subset of a topological vector space and $Y$ a convex subset. If $f: X \times Y \to \mathbb{R}$ satisfies:

$f(\cdot, y)$ is quasiconvex and lower semicontinuous on $X$ for each $y$
$f(x, \cdot)$ is quasiconcave and upper semicontinuous on $Y$ for each $x$

Then:

$\min_{x \in X} \sup_{y \in Y} f(x, y) = \sup_{y \in Y} \min_{x \in X} f(x, y)$

Intuition

Sion's theorem extends von Neumann's result from bilinear functions on simplices to convex-concave (or more generally, quasiconvex-quasiconcave) functions on arbitrary compact convex sets. Only one of the two sets needs to be compact. This is the version most useful in continuous optimization and ML, where the strategy spaces are not simplices but arbitrary convex parameter spaces.

Proof Sketch

The proof uses the intersection property of compact convex sets (a topological fixed-point argument). For each $\varepsilon > 0$ , the sets $\{x : f(x, y) \leq v^* + \varepsilon\}$ (for varying $y$ ) have the finite intersection property by the quasiconcavity of $f$ in $y$ . Compactness of $X$ gives a point in the intersection, and taking $\varepsilon \to 0$ yields the result.

Why It Matters

Sion's theorem is the version you actually use in ML proofs. The GAN minimax equality (for the population objective with infinite-capacity function classes), adversarial training with $\ell_p$ perturbation sets, and distributionally robust optimization all invoke Sion's theorem (or the closely related Fan minimax theorem) rather than von Neumann's original result.

Failure Mode

Both compactness and convexity-concavity are required. If the function is not convex-concave (e.g., $f(x,y) = \sin(x+y)$ on $[0, 2\pi]^2$ ), the minimax equality can fail. If neither set is compact (e.g., $f(x,y) = xy$ on $\mathbb{R}^2$ ), the sup and inf may not be attained.

report a correction →

Saddle Points

Definition

Saddle Point

A pair $(x^*, y^*)$ is a saddle point of $f$ on $X \times Y$ (under the row-max/column-min convention used above, where $x$ is the maximizer and $y$ is the minimizer) if and only if:

$f(x, y^*) \leq f(x^*, y^*) \leq f(x^*, y) \quad \forall x \in X, \; \forall y \in Y$

Equivalently, $x^*$ maximizes $f(\cdot, y^*)$ and $y^*$ minimizes $f(x^*, \cdot)$ . (Under the opposite convention — $x$ minimizes, $y$ maximizes — the inequalities flip.) A saddle point exists if and only if the minimax equality holds and the optimizers are attained.

The minimax theorem guarantees that the zero-sum game has a saddle point in mixed strategies. At the saddle point, neither player can improve unilaterally: player 1 cannot increase the payoff, and player 2 cannot decrease it. This is exactly the Nash equilibrium condition for zero-sum games.

Computing Minimax Strategies

For a finite zero-sum game with matrix $A$ , the minimax strategies are computed by solving the LP:

$\max_{x, v} \; v \quad \text{s.t.} \quad A^\top x \geq v \mathbf{1}, \; \mathbf{1}^\top x = 1, \; x \geq 0$

This LP has $m + 1$ variables and $n + 1$ constraints (plus nonnegativity). It can be solved in $O(\text{poly}(m, n))$ time using interior-point methods. For $2 \times 2$ games, the solution can be written in closed form using the indifference principle.

For continuous games (Sion's setting), minimax strategies are found by convex optimization methods: projected gradient descent-ascent, extragradient methods, or mirror descent. Convergence rates depend on the regularity of the objective and the algorithm: nonsmooth convex-concave saddle-point problems solved by subgradient or mirror-descent ascent-descent typically achieve an $O(1/\sqrt{T})$ duality-gap rate (for the averaged iterate), while $O(1/T)$ rates require additional structure such as a smooth monotone operator combined with Nemirovski's extragradient method or Nesterov's dual extrapolation (Nemirovski 2004; Nesterov 2007). Strongly convex-strongly concave objectives admit linear rates.

Applications in Machine Learning

GANs

The GAN objective is:

$\min_G \max_D \; \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))]$

For fixed generator and discriminator function classes of sufficient capacity, this is a minimax problem. The minimax theorem (Sion's version) applies when the inner maximization over $D$ is concave (it is, in $D$ ) and the outer minimization over $G$ is convex in the relevant sense. At the saddle point, $p_G = p_{\text{data}}$ .

Adversarial Robustness

Adversarial training solves:

$\min_\theta \; \mathbb{E}_{(x,y)} \left[ \max_{\|\delta\| \leq \varepsilon} \mathcal{L}(f_\theta(x + \delta), y) \right]$

The inner maximization finds the worst-case perturbation. The outer minimization trains the model to be robust. If $\mathcal{L}$ is convex in $\theta$ and the perturbation set $\{\delta : \|\delta\| \leq \varepsilon\}$ is compact and convex, the minimax theorem guarantees the order of optimization does not matter.

Online Learning and Regret

The minimax regret framework in online learning asks: what is the best worst-case regret an algorithm can achieve against an adversary? For the experts problem, the minimax optimal regret is $\Theta(\sqrt{T \log N})$ for $T$ rounds and $N$ experts. This is proved using a minimax argument: the algorithm and adversary play a zero-sum game where the payoff is regret.

Common Confusions

Watch Out

Minimax equality requires convexity-concavity, not just any function

For a general function $f(x,y)$ , the weak inequality $\max_x \min_y f \leq \min_y \max_x f$ always holds. Equality requires the function to be convex-concave (or quasiconvex-quasiconcave with appropriate compactness). The neural network parameterization in GANs makes the actual optimization landscape non-convex-concave, which is why GAN training can fail to converge despite the population-level minimax theorem applying.

Watch Out

The minimax theorem says nothing about algorithms

The theorem guarantees that the value exists and optimal strategies exist. It does not say that gradient descent-ascent converges to the saddle point. Simultaneous gradient descent on a bilinear function $x^\top A y$ can cycle forever without converging. Extra-gradient methods, optimistic gradient descent, or two-time-scale updates are needed for convergence.

Watch Out

LP duality and game-theoretic minimax are the same theorem

The proof of the minimax theorem via LP duality and the proof of LP strong duality via Farkas' lemma are logically equivalent. Any one implies the other. This is not a coincidence: solving a zero-sum game IS solving a linear program, and vice versa. Von Neumann proved the minimax theorem in 1928; Dantzig formulated the LP in 1947 and recognized the connection.

Exercises

ExerciseCore

Problem

For the zero-sum game with payoff matrix $A = \begin{pmatrix} 2 & -1 \\ -1 & 1 \end{pmatrix}$ , find the value of the game and the optimal mixed strategies for both players.

ExerciseCore

Problem

Prove the weak duality inequality: for any $x \in \Delta_m$ and $y \in \Delta_n$ :

$\min_{y' \in \Delta_n} x^\top A y' \leq x^\top A y \leq \max_{x' \in \Delta_m} (x')^\top A y$

Conclude that $\max_x \min_y x^\top A y \leq \min_y \max_x x^\top A y$ .

ExerciseAdvanced

Problem

Write the LP for computing the maximin strategy of a zero-sum game with payoff matrix $A \in \mathbb{R}^{3 \times 4}$ . Write its dual LP. Verify that the dual corresponds to the minimax strategy for player 2. What is the relationship between the number of variables and constraints in the primal and dual?

ExerciseAdvanced

Problem

Show that Sion's minimax theorem implies Von Neumann's minimax theorem as a special case. Verify that the bilinear function $f(x,y) = x^\top A y$ on $\Delta_m \times \Delta_n$ satisfies all hypotheses of Sion's theorem.

References

Canonical:

Von Neumann, "Zur Theorie der Gesellschaftsspiele" (1928), Mathematische Annalen
Von Neumann & Morgenstern, Theory of Games and Economic Behavior (1944), Chapter 3
Sion, "On General Minimax Theorems" (1958), Pacific Journal of Mathematics

Optimization:

Boyd & Vandenberghe, Convex Optimization (2004), Chapter 5 (LP duality)
Bertsimas & Tsitsiklis, Introduction to Linear Optimization (1997), Chapter 4

ML Applications:

Goodfellow et al., "Generative Adversarial Nets" (2014), Section 4
Madry et al., "Towards Deep Learning Models Resistant to Adversarial Attacks" (2018)
Cesa-Bianchi & Lugosi, Prediction, Learning, and Games (2006), Chapters 4, 7

Next Topics

Building on the minimax theorem:

Adversarial machine learning: minimax formulations for robustness

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Convex Dualitylayer 2 · tier 1
Game Theory Foundationslayer 2 · tier 1
Nash Equilibriumlayer 2 · tier 2

Derived topics

1

Adversarial Machine Learninglayer 4 · tier 2

Graph-backed continuations

Adversarial Machine Learning