Minimax and Saddle Points

Minimax problems appear whenever two objectives conflict. In zero-sum games, one player maximizes what the other minimizes. In robust optimization, you minimize loss under the worst-case perturbation. In GANs, the generator minimizes what the discriminator maximizes. In frequentist decision theory, nature is treated adversarially and the minimax rule protects against the worst $\theta$ . The minimax theorem tells you when you can swap the order of max and min without changing the value. When this swap is valid, the problem has a saddle point and is tractable. When it fails, the problem is harder.

Level curves of f(x, y) = x² − y² around the saddle at (0, 0). Amber: f > 0 (max-player territory). Blue: f < 0 (min-player territory). The asymptotic cross y = ±x is the f = 0 set.

The diagram shows the level curves of $f(x, y) = x^2 - y^2$ around the saddle at the origin. Amber hyperbolas open left-right (max-player territory: $f > 0$ ); blue hyperbolas open up-down (min-player territory: $f < 0$ ); the dashed cross $y = \pm x$ is the $f = 0$ set. The convex slice $x \mapsto f(x, 0) = x^2$ has its minimum at $x = 0$ ; the concave slice $y \mapsto f(0, y) = -y^2$ has its maximum at $y = 0$ . The saddle inequality $f(0, y) \le f(0, 0) \le f(x, 0)$ pins the equilibrium at the intersection.

Mental Model

Consider a function $f(x, y)$ where one player chooses $x$ to maximize and the other chooses $y$ to minimize. The max-player moves first in $\max_x \min_y f(x, y)$ : they choose $x$ knowing the min-player will respond optimally. The min-player moves first in $\min_y \max_x f(x, y)$ : they choose $y$ knowing the max-player will respond optimally. Moving second is always at least as good, so $\max_x \min_y f \leq \min_y \max_x f$ . The minimax theorem identifies conditions under which equality holds. When equality fails, the gap $\min_y \max_x f - \max_x \min_y f$ is the duality gap; in the Nash equilibrium of a non-convex game it can be strictly positive.

Core Definitions

Definition

Saddle Point $(x^{*}, y^{*})$

A point $(x^*, y^*)$ is a saddle point of $f: \mathcal{X} \times \mathcal{Y} \to \mathbb{R}$ if and only if:

$f(x^*, y) \leq f(x^*, y^*) \leq f(x, y^*) \quad \text{for all } x \in \mathcal{X}, \, y \in \mathcal{Y}$

Equivalently: $x^*$ maximizes $f(\cdot, y^*)$ and $y^*$ minimizes $f(x^*, \cdot)$ . Neither player can improve by unilateral deviation.

Definition

Minimax Value and Maximin Value

The maximin value is $\underline{v} = \max_{x \in \mathcal{X}} \min_{y \in \mathcal{Y}} f(x, y)$ .

The minimax value is $\overline{v} = \min_{y \in \mathcal{Y}} \max_{x \in \mathcal{X}} f(x, y)$ .

The weak duality inequality always holds: $\underline{v} \leq \overline{v}$ . When $\underline{v} = \overline{v}$ , we say the minimax theorem holds and the common value is the value of the game.

Definition

Convex-Concave Function

A function $f: \mathcal{X} \times \mathcal{Y} \to \mathbb{R}$ is convex-concave if and only if $f(\cdot, y)$ is convex for each fixed $y$ and $f(x, \cdot)$ is concave for each fixed $x$ . Equivalently, the max-player faces a concave objective (wants to maximize) and the min-player faces a convex objective (wants to minimize).

Main Theorems

Theorem

Von Neumann Minimax Theorem

Statement

Under the above conditions:

$\max_{x \in \mathcal{X}} \min_{y \in \mathcal{Y}} f(x, y) = \min_{y \in \mathcal{Y}} \max_{x \in \mathcal{X}} f(x, y)$

Moreover, a saddle point $(x^*, y^*)$ exists.

Intuition

Convexity-concavity plus compactness ensures that neither player benefits from going first. The convex-concave structure prevents "misaligned" curvature that would let one player exploit knowledge of the other's choice. Compactness ensures the optima are attained (not just approached in the limit).

Proof Sketch

One proof uses the separating hyperplane theorem. Define $S = \{(u, v) : u \leq f(x, y), v \geq f(x, y) \text{ for some } (x,y)\}$ . The convex-concave structure makes $S$ convex. If maximin $<$ minimax, there is a gap, and a separating hyperplane argument yields a contradiction. An alternative proof uses Brouwer's fixed point theorem on the best-response correspondence.

Why It Matters

This theorem is the foundation of game theory and connects directly to linear programming duality. In a finite zero-sum game (matrix game), the strategy spaces are simplices (compact, convex) and the payoff is bilinear (convex-concave). Von Neumann's theorem guarantees a value and optimal mixed strategies. Strong duality in LP is a corollary: the primal and dual LP values are equal.

Failure Mode

Without convexity-concavity, the minimax gap can be strict. For example, let $\mathcal{X} = \mathcal{Y} = \{0, 1\}$ (not convex, just two points) and $f(x,y) = \mathbf{1}[x = y]$ . Then $\max_x \min_y f = 0$ (for any $x$ , the min-player matches) but $\min_y \max_x f = 1$ (for any $y$ , the max-player matches). The gap is 1. Convexifying the strategy sets (allowing mixed strategies) restores equality.

Without compactness, the optima may not be attained even when the function is convex-concave. Consider $f(x, y) = xy$ on $\mathcal{X} = \mathcal{Y} = (0, \infty)$ : the sup-inf $\sup_x \inf_y xy = 0$ (not attained, approached as $y \to 0$ ) and the inf-sup $\inf_y \sup_x xy = +\infty$ (not attained, $\sup_x xy = \infty$ for any $y > 0$ ). The values are well-defined as extended-real sup/inf but neither is attained, and they disagree. Compactness is what forces the extrema to be attained so that "max" and "min" replace "sup" and "inf."

report a correction →

Theorem

Sion Minimax Theorem

Statement

Under these conditions:

$\max_{x \in \mathcal{X}} \inf_{y \in \mathcal{Y}} f(x, y) = \inf_{y \in \mathcal{Y}} \max_{x \in \mathcal{X}} f(x, y)$

Intuition

Sion's theorem relaxes Von Neumann's conditions in two ways: it requires quasiconvexity-quasiconcavity instead of convexity-concavity, and $\mathcal{Y}$ need not be compact (only $\mathcal{X}$ must be compact). This covers cases where the min-player's domain is unbounded.

Proof Sketch

The proof uses a topological intersection argument. For each finite subset $\{y_1, \ldots, y_k\} \subset \mathcal{Y}$ , the set $\{x : \min_j f(x, y_j) \geq v\}$ is nonempty and closed by upper semicontinuity and quasiconcavity. The finite intersection property (from compactness of $\mathcal{X}$ ) yields a point where $\inf_y f(x, y) \geq v$ for the critical value $v$ .

Why It Matters

Sion's theorem applies to infinite-dimensional settings (e.g., function spaces in variational problems) and to cases where strict convexity-concavity fails but quasiconvexity-quasiconcavity holds. It is the most general classical minimax result.

Failure Mode

Without compactness of at least one of the two sets, the theorem can fail even with quasiconvexity-quasiconcavity. Both sets being merely convex and unbounded is not enough.

report a correction →

Connection to Duality

Strong duality in linear programming is a special case of the minimax theorem. The LP primal-dual pair:

$\min_{x \geq 0} c^\top x \quad \text{s.t. } Ax = b \quad \longleftrightarrow \quad \max_{y} b^\top y \quad \text{s.t. } A^\top y \leq c$

can be written as $\min_x \max_y L(x, y)$ where $L(x, y) = c^\top x + y^\top(b - Ax)$ is the Lagrangian. This $L$ is linear (hence convex-concave) in $(x, y)$ , and the constraint sets are convex. Strong duality ( $\min = \max$ ) follows from the minimax theorem.

More generally, for convex programs: strong duality holds when Slater's condition is satisfied, and the saddle point of the Lagrangian corresponds to the primal-dual optimal pair.

Connection to Frequentist Decision Theory

In frequentist decision theory, nature is treated as an adversary that picks the parameter $\theta$ to maximize the statistician's loss. The minimax decision rule is

\delta^* = \arg\min_\delta \sup_\theta R(\theta, \delta),

where $R(\theta, \delta) = \mathbb{E}_\theta[L(\theta, \delta(X))]$ is the frequentist risk. This is the saddle-point setup of this page, with the statistician as the min-player and nature as the max-player.

A standard route to a minimax rule uses a least-favorable prior $\pi^*$ : the Bayes rule $\delta_{\pi^*}$ with respect to a prior chosen to maximize the Bayes risk satisfies

r(\pi^*, \delta_{\pi^*}) = \sup_\theta R(\theta, \delta_{\pi^*}),

and the saddle-point identity certifies $\delta_{\pi^*}$ as minimax. The connection runs both ways: under regularity (Wald's complete-class theorem), every admissible rule is Bayes or a limit of Bayes rules, so the minimax rule sits inside the same family as the Bayes rules. The shrinkage estimation / James-Stein page picks up the dimension-dependent admissibility surprise that arises when the convex-concave structure here is replaced by a high-dimensional Gaussian risk function.

Connection to GANs

The GAN objective is:

$\min_G \max_D \mathbb{E}_{x \sim p_{\text{data}}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))]$

For fixed $G$ , the optimal discriminator is $D^*(x) = p_{\text{data}}(x)/(p_{\text{data}}(x) + p_G(x))$ , and the resulting objective for $G$ is $2 \, \text{JSD}(p_{\text{data}} \| p_G) - \log 4$ , where JSD is the Jensen-Shannon divergence. The minimax theorem does not directly apply because the generator and discriminator are parameterized by neural networks (non-convex). Training GANs is therefore a non-convex saddle-point problem, and convergence guarantees are limited.

Connection to Robust Optimization

In distributionally robust optimization (DRO):

$\min_\theta \max_{P \in \mathcal{U}} \mathbb{E}_P[\ell(\theta, X)]$

where $\mathcal{U}$ is an uncertainty set of distributions. When $\mathcal{U}$ is convex and compact (in an appropriate topology) and the loss is convex in $\theta$ , the minimax theorem applies. This connects to adversarial training and to robust statistics: the inner maximization finds the worst-case distribution, and the outer minimization hedges against it.

Canonical Examples

Example

Matrix game (Rock-Paper-Scissors)

The payoff matrix for Rock-Paper-Scissors is:

$A = \begin{bmatrix} 0 & -1 & 1 \\ 1 & 0 & -1 \\ -1 & 1 & 0 \end{bmatrix}$

The row player maximizes $x^\top A y$ , the column player minimizes. Strategy spaces are the 3-simplex (compact, convex). The payoff is bilinear (convex-concave in mixed strategies). By Von Neumann's theorem, the game has value 0 and the unique optimal strategy for both players is $(1/3, 1/3, 1/3)$ .

Example

Strong duality in LP

Primal: $\min \, 3x_1 + 5x_2$ subject to $x_1 + x_2 \geq 4$ , $x_1, x_2 \geq 0$ . Dual: $\max \, 4y$ subject to $y \leq 3$ , $y \leq 5$ , $y \geq 0$ . The dual optimum is $y^* = 3$ , giving value 12. The primal optimum is $x^* = (4, 0)$ , also giving value 12. Strong duality holds: $\min = \max = 12$ . The Lagrangian $L(x, y) = 3x_1 + 5x_2 + y(4 - x_1 - x_2)$ has a saddle point at $(x^*, y^*)$ .

Common Confusions

Watch Out

Minimax is not the same as min-max (sequential optimization)

In $\min_y \max_x f(x, y)$ , the min-player commits to $y$ first, then the max-player responds optimally. This is different from alternating gradient descent-ascent, where both players update simultaneously. Alternating updates can cycle or diverge for non-convex-concave problems. The minimax theorem is about the existence of a saddle point, not about an algorithm to find it.

Watch Out

GANs do not satisfy the minimax theorem assumptions

The GAN objective is non-convex in the generator parameters. Von Neumann's theorem does not apply. The "theoretical optimum" analysis (showing the global optimum is $p_G = p_{\text{data}}$ ) assumes the discriminator is optimized exactly for each $G$ , which never happens in practice. GAN training is a non-convex saddle-point problem without convergence guarantees from classical minimax theory.

Watch Out

Weak duality always holds, strong duality requires conditions

$\max_x \min_y f \leq \min_y \max_x f$ is true for any $f$ (the "max-min inequality"). This is a one-line proof: $\min_y f(x, y) \leq f(x, y) \leq \max_x f(x, y)$ , then take $\max_x$ on the left and $\min_y$ on the right. Equality (strong duality) requires structure: convexity-concavity plus compactness, or Slater's condition, or specific problem structure like LP.

Summary

Weak duality: $\max_x \min_y f \leq \min_y \max_x f$ always holds
Von Neumann: equality holds for continuous convex-concave $f$ on compact convex sets
Saddle point $(x^*, y^*)$ : $f(x^*, y) \leq f(x^*, y^*) \leq f(x, y^*)$
Strong LP duality is a corollary of the minimax theorem
GANs are minimax problems but violate the convexity assumption
Robust optimization: minimax over uncertainty sets

Exercises

ExerciseCore

Problem

Consider the function $f(x, y) = x^2 - y^2$ on $\mathcal{X} = [-1, 1]$ , $\mathcal{Y} = [-1, 1]$ . Find $\max_x \min_y f(x,y)$ and $\min_y \max_x f(x,y)$ . Does a saddle point exist?

ExerciseAdvanced

Problem

Prove the weak duality inequality $\max_x \min_y f(x,y) \leq \min_y \max_x f(x,y)$ for any function $f: \mathcal{X} \times \mathcal{Y} \to \mathbb{R}$ , assuming the max and min exist.

References

Canonical:

Von Neumann, "Zur Theorie der Gesellschaftsspiele" (1928)
Sion, "On General Minimax Theorems" (1958)

Current:

Boyd & Vandenberghe, Convex Optimization (2004), Chapter 5 (duality)
Goodfellow et al., "Generative Adversarial Nets" (2014)
Ben-Tal, El Ghaoui, Nemirovski, Robust Optimization (2009), Chapters 1-2

Next Topics

Decision theory foundations: minimax as the conservative decision rule under nature-as-adversary, with Bayes-rule-via-least-favorable-prior as the standard route
Shrinkage estimation and James-Stein: where the convex-concave structure here breaks and high-dimensional admissibility surprises arise
Adversarial machine learning: minimax formulations for robustness
Robust statistics: worst-case optimization over uncertainty sets and contamination neighborhoods

Last reviewed: May 3, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Convex Optimization Basicslayer 1 · tier 1
Convex Dualitylayer 2 · tier 1

Derived topics

2

Robust Statistics and M-Estimatorslayer 3 · tier 2
Adversarial Machine Learninglayer 4 · tier 2

Graph-backed continuations

Adversarial Machine Learning Robust Statistics and M-Estimators