Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Numerical Optimization

Submodular Optimization

Submodular functions exhibit diminishing returns. The greedy algorithm achieves a (1-1/e) approximation for monotone submodular maximization under cardinality constraints, with applications in feature selection, sensor placement, and data summarization.

AdvancedTier 3Stable~50 min

Prerequisites

0

Why This Matters

Many ML problems require selecting a small subset from a large ground set: which features to include, which data points to label, where to place sensors, which documents to include in a summary. When the objective has the "diminishing returns" property (adding an element helps less when you already have a large set), it is submodular. Submodularity provides the theoretical guarantee that a simple greedy algorithm is near-optimal.

The (11/e)0.632(1 - 1/e) \approx 0.632 approximation ratio is tight: no polynomial-time algorithm can do better unless P = NP.

Definitions

Definition

Submodular Function

A set function f:2VRf: 2^V \to \mathbb{R} on a finite ground set VV is submodular if for all ABVA \subseteq B \subseteq V and eVBe \in V \setminus B:

f(A{e})f(A)f(B{e})f(B)f(A \cup \{e\}) - f(A) \geq f(B \cup \{e\}) - f(B)

The marginal gain of adding element ee decreases as the set grows. This is the diminishing returns property.

Equivalent characterization: ff is submodular if and only if for all A,BVA, B \subseteq V:

f(A)+f(B)f(AB)+f(AB)f(A) + f(B) \geq f(A \cup B) + f(A \cap B)

Definition

Monotone Submodular Function

A submodular function ff is monotone if f(A)f(B)f(A) \leq f(B) whenever ABA \subseteq B. Adding elements never decreases the objective. We also assume f()=0f(\emptyset) = 0 (normalized).

Examples of submodular functions:

  • Coverage: f(S)=iSCif(S) = |\bigcup_{i \in S} C_i| where CiC_i are sets. Monotone, submodular.
  • Entropy: f(S)=H(XS)f(S) = H(X_S) where XSX_S is a subset of random variables. Monotone, submodular.
  • Mutual information: f(S)=I(XS;Y)f(S) = I(X_S; Y) for prediction target YY. Monotone, submodular under jointly Gaussian assumptions.
  • Graph cut function: f(S)={(u,v)E:uS,vS}f(S) = |\{(u, v) \in E : u \in S, v \notin S\}|. Submodular but not monotone.

The Greedy Algorithm

The greedy algorithm for maximizing ff subject to Sk|S| \leq k:

  1. Start with S0=S_0 = \emptyset.
  2. For i=1,,ki = 1, \ldots, k: add the element with the largest marginal gain, ei=argmaxeVSi1[f(Si1{e})f(Si1)]e_i = \arg\max_{e \in V \setminus S_{i-1}} [f(S_{i-1} \cup \{e\}) - f(S_{i-1})].
  3. Return SkS_k.

Each step makes O(V)O(|V|) function evaluations. Total cost: O(kV)O(k \cdot |V|) function evaluations.

Theorem

Greedy Approximation for Monotone Submodular Maximization

Statement

Let S=argmaxSkf(S)S^* = \arg\max_{|S| \leq k} f(S) be the optimal solution and SkS_k be the greedy solution. Then:

f(Sk)(11e)f(S)0.632f(S)f(S_k) \geq \left(1 - \frac{1}{e}\right) f(S^*) \approx 0.632 \cdot f(S^*)

Intuition

At each greedy step, the marginal gain is at least 1k(f(S)f(Si1))\frac{1}{k}(f(S^*) - f(S_{i-1})) because the optimal set SS^* can improve ff by at most f(S)f(S^*) total, and this improvement can be distributed across at most kk elements (by submodularity, the best single-element gain is at least 1/k1/k of the remaining gap). The resulting recurrence f(S)f(Si)(11/k)(f(S)f(Si1))f(S^*) - f(S_i) \leq (1 - 1/k)(f(S^*) - f(S_{i-1})) gives (11/k)k1/e(1 - 1/k)^k \leq 1/e after kk steps.

Proof Sketch

Let Δi=f(S)f(Si1)\Delta_i = f(S^*) - f(S_{i-1}) be the remaining gap. Write S={e1,,ek}S^* = \{e_1^*, \ldots, e_k^*\}. By submodularity, j=1k[f(Si1{ej})f(Si1)]f(Si1S)f(Si1)f(S)f(Si1)=Δi\sum_{j=1}^k [f(S_{i-1} \cup \{e_j^*\}) - f(S_{i-1})] \geq f(S_{i-1} \cup S^*) - f(S_{i-1}) \geq f(S^*) - f(S_{i-1}) = \Delta_i. So some element has marginal gain at least Δi/k\Delta_i / k. The greedy choice is at least this good: f(Si)f(Si1)Δi/kf(S_i) - f(S_{i-1}) \geq \Delta_i / k. Thus Δi+1(11/k)Δi\Delta_{i+1} \leq (1 - 1/k) \Delta_i. After kk steps: Δk(11/k)kΔ0f(S)/e\Delta_k \leq (1 - 1/k)^k \Delta_0 \leq f(S^*)/e.

Why It Matters

A 63.2% approximation from a simple greedy algorithm is remarkable. The algorithm is easy to implement, runs in polynomial time, and works for any monotone submodular objective. No problem-specific structure is needed beyond submodularity.

Failure Mode

The (11/e)(1 - 1/e) bound is tight for the greedy algorithm. For non-monotone submodular functions, the greedy algorithm can perform arbitrarily badly; different algorithms (randomized greedy, double-greedy) are needed. For constraints beyond cardinality (matroid constraints, knapsack constraints), the approximation ratio changes.

Theorem

Hardness of Submodular Maximization

Statement

No polynomial-time algorithm can achieve an approximation ratio better than (11/e)(1 - 1/e) for maximizing a monotone submodular function subject to a cardinality constraint, unless P = NP.

Intuition

The max-kk-cover problem (a special case of monotone submodular maximization) is NP-hard, and the (11/e)(1 - 1/e) inapproximability for max-kk-cover carries over to the general submodular case.

Proof Sketch

Feige (1998) showed that max-kk-cover cannot be approximated beyond (11/e+ϵ)(1 - 1/e + \epsilon) for any ϵ>0\epsilon > 0 under the assumption P \neq NP. Since max-kk-cover is a special case of monotone submodular maximization, the hardness applies to the general problem.

Why It Matters

This means the greedy algorithm is optimally efficient: you cannot do better in polynomial time. There is no room for improvement unless you exploit additional structure beyond submodularity.

Failure Mode

The hardness result is for the worst case. For specific submodular functions with additional structure (e.g., graph-based, modular components), better approximation ratios or exact algorithms may exist.

Common Confusions

Watch Out

Submodularity is not convexity

Submodularity is sometimes called "discrete convexity," but this is misleading. A submodular function over sets is the discrete analogue of a concave function (not convex) when it comes to maximization: the greedy approach works for maximizing concave-like objectives. Submodular minimization is the analogue of convex minimization: it can be solved exactly in polynomial time via the Lovász extension.

Watch Out

The greedy algorithm does not work for non-monotone objectives

For non-monotone submodular maximization (e.g., graph cut), the standard greedy algorithm can fail completely. It may keep adding elements that individually look good but collectively produce a poor solution. Algorithms like randomized double greedy (Buchbinder et al. 2015) achieve a 1/21/2 approximation for unconstrained non-monotone submodular maximization.

Canonical Examples

Example

Feature selection as submodular maximization

Given nn candidate features and labels YY, define f(S)=I(XS;Y)f(S) = I(X_S; Y), the mutual information between the selected features and the target. Under a jointly Gaussian model, ff is monotone submodular. Greedy feature selection: at each step, add the feature with the highest conditional mutual information I(Xe;YXS)I(X_e; Y \mid X_S). This achieves at least 63.2% of the optimal kk-feature mutual information.

Exercises

ExerciseCore

Problem

Prove that the coverage function f(S)=iSCif(S) = |\bigcup_{i \in S} C_i| is submodular by verifying the diminishing returns property directly.

ExerciseAdvanced

Problem

Construct a monotone submodular function and a cardinality constraint kk where the greedy algorithm achieves exactly the (11/e)(1 - 1/e) ratio (or arbitrarily close to it).

References

Canonical:

  • Nemhauser, Wolsey, Fisher, "An analysis of approximations for maximizing submodular set functions" (1978)
  • Feige, "A threshold of ln n for approximating set cover" (1998)

Current:

  • Krause & Golovin, "Submodular Function Maximization" in Tractability (2014)
  • Buchbinder et al., "A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization" (2015)

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics