Weak vs. Strong Duality. When the Duality Gap Is Zero

What Each States

Both weak and strong duality relate the primal optimization problem to its Lagrangian dual. Consider the primal problem:

$\min_{x} f_0(x) \quad \text{s.t.} \quad f_i(x) \leq 0, \; i = 1, \ldots, m; \quad h_j(x) = 0, \; j = 1, \ldots, p$

and its Lagrangian dual:

$\max_{\lambda, \nu} g(\lambda, \nu) = \max_{\lambda \geq 0, \nu} \inf_{x} \left[ f_0(x) + \sum_i \lambda_i f_i(x) + \sum_j \nu_j h_j(x) \right]$

Weak duality says the dual optimal value never exceeds the primal optimal value.

Strong duality says they are equal.

Side-by-Side Statement

Definition

Weak Duality

Let $p^*$ be the primal optimal value and $d^*$ the dual optimal value. Then:

$d^* \leq p^*$

This holds for any optimization problem (convex or not), with no additional assumptions. The quantity $p^* - d^* \geq 0$ is called the duality gap.

Definition

Strong Duality

Strong duality holds when:

$d^* = p^*$

The duality gap is zero. This requires additional conditions beyond convexity.

Why Weak Duality Always Holds

Theorem

Weak Duality

Statement

For any optimization problem with Lagrangian $L(x, \lambda, \nu)$ and dual function $g(\lambda, \nu) = \inf_x L(x, \lambda, \nu)$ , we have $d^* \leq p^*$ .

Intuition

For any feasible primal point $x$ and any dual-feasible $(\lambda \geq 0, \nu)$ :

$g(\lambda, \nu) = \inf_{\tilde{x}} L(\tilde{x}, \lambda, \nu) \leq L(x, \lambda, \nu) \leq f_0(x)$

The first inequality is because the infimum is at most any particular value. The second inequality holds because at a feasible $x$ : $f_i(x) \leq 0$ and $\lambda_i \geq 0$ , so $\lambda_i f_i(x) \leq 0$ ; and $h_j(x) = 0$ , so $\nu_j h_j(x) = 0$ . Maximizing over $(\lambda, \nu)$ and minimizing over $x$ preserves the inequality.

Failure Mode

Weak duality never fails. It is a universal property of the Lagrangian framework. The only question is whether the gap is zero (strong duality) or positive.

report a correction →

When Strong Duality Holds

Theorem

Strong Duality via Slater's Condition

Statement

If the primal problem is convex ( $f_0, f_1, \ldots, f_m$ convex, $h_j$ affine) and there exists a point $\tilde{x}$ in the relative interior of the domain with $f_i(\tilde{x}) < 0$ for all $i$ (strict inequality), then strong duality holds: $d^* = p^*$ .

Intuition

Slater's condition requires a strictly feasible point: one that satisfies all inequality constraints with slack. This ensures the feasible set has nonempty interior, which prevents pathological geometric configurations where the constraint surface is "tangent" to the objective level set in a way that creates a gap.

Failure Mode

Slater's condition is sufficient, not necessary. Strong duality can hold without it (e.g., for linear programs, strong duality holds whenever the primal or dual is feasible and bounded). Strong duality fails when the feasible region has no relative interior point satisfying strict inequalities, which can happen with non-convex problems or degenerate convex problems.

report a correction →

When Strong Duality Fails

Strong duality can fail even for convex problems when constraint qualifications are violated.

Example

A convex problem with a duality gap

Consider:

$\min_{x_1, x_2} x_1 \quad \text{s.t.} \quad x_1^2/(x_2) \leq 0, \; x_2 \geq 0$

where the constraint function $f_1(x) = x_1^2/x_2$ is convex on $x_2 > 0$ . The feasible set is $\{(0, x_2) : x_2 \geq 0\}$ , so $p^* = 0$ . The Lagrangian is $L(x, \lambda) = x_1 + \lambda x_1^2 / x_2$ , and $\inf_{x_2 > 0, x_1} L = -\infty$ for every $\lambda \geq 0$ (taking $x_1 \to -\infty$ with $x_2$ large), giving $d^* = -\infty$ . Slater's condition fails because there is no point with $f_1(x) < 0$ and $x_2 > 0$ . See Boyd and Vandenberghe (2004), Section 5.2.4.

For non-convex problems, duality gaps are the norm, not the exception. Integer programming problems routinely have large duality gaps. The LP relaxation dual gives a lower bound on the integer optimum, and the gap measures how far the relaxation is from exact.

KKT Conditions Under Strong Duality

When strong duality holds, the KKT conditions become necessary for optimality (for convex problems, they are also sufficient):

Stationarity: $\nabla f_0(x^*) + \sum_i \lambda_i^* \nabla f_i(x^*) + \sum_j \nu_j^* \nabla h_j(x^*) = 0$
Primal feasibility: $f_i(x^*) \leq 0$ , $h_j(x^*) = 0$
Dual feasibility: $\lambda_i^* \geq 0$
Complementary slackness: $\lambda_i^* f_i(x^*) = 0$ for all $i$

Without strong duality, a primal optimum may not correspond to any dual optimum, and the KKT conditions may have no solution.

Key Assumptions That Differ

	Weak Duality	Strong Duality
Statement	$d^* \leq p^*$	$d^* = p^*$
Assumptions	None (always holds)	Constraint qualification (e.g., Slater)
Convexity required?	No	Typically yes (for Slater)
Practical use	Lower bounds on optimal value	Solve dual instead of primal
KKT	Not guaranteed	KKT necessary and sufficient

Practical Consequences

When strong duality holds, you can solve the dual instead of the primal. This is useful when:

The dual is easier. The SVM dual has $n$ variables (one per training point) with a simple constraint set, while the primal has $d$ variables (one per feature). When $n < d$ , the dual is cheaper.
The dual reveals structure. The SVM dual shows that only support vectors matter ( $\lambda_i > 0$ only for points on or inside the margin).
You need a lower bound. Even if you cannot solve the primal exactly, any dual-feasible point gives a guaranteed lower bound on the primal optimum.

Common Confusions

Watch Out

Slater is sufficient, not necessary

Many textbooks present Slater's condition as the criterion for strong duality. But strong duality can hold without Slater. Linear programs always satisfy strong duality (when feasible and bounded) without needing a strictly feasible point. Slater is the most commonly used sufficient condition, not the only path to strong duality.

Watch Out

Strong duality does not mean the dual is easy to solve

Having $d^* = p^*$ means the dual has the same optimal value. It does not mean the dual problem is computationally easier. For some problems the dual is harder. The advantage of duality is structural, not automatically computational.

Watch Out

Complementary slackness requires strong duality

Complementary slackness ( $\lambda_i^* f_i(x^*) = 0$ ) is a consequence of zero duality gap. If the gap is positive, there is no pair $(x^*, \lambda^*, \nu^*)$ satisfying all KKT conditions simultaneously.

References

Canonical:

Boyd & Vandenberghe, Convex Optimization (2004), Chapter 5
Bertsekas, Nonlinear Programming (1999), Chapter 5

Current:

Bubeck, "Convex Optimization: Algorithms and Complexity" (2015), Section 4