Submodular Optimization

Sneiderman, Robby

Numerical Optimization

Submodular Optimization

Submodular functions exhibit diminishing returns. The greedy algorithm achieves a (1-1/e) approximation for monotone submodular maximization under cardinality constraints, with applications in feature selection, sensor placement, and data summarization.

AdvancedTier 3StableSupporting~50 min

Prerequisites

Greedy Algorithms

Start 8-question practice · 5 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

numerical-optimization | layer 3 | tier 3. This page has 1 direct prerequisite and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Convex Optimization Basics

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Many ML problems require selecting a small subset from a large ground set: which features to include, which data points to label, where to place sensors, which documents to include in a summary. When the objective has the "diminishing returns" property (adding an element helps less when you already have a large set), it is submodular. Submodularity provides the theoretical guarantee that a simple greedy algorithm is near-optimal.

The $(1 - 1/e) \approx 0.632$ approximation ratio is tight: no polynomial-time algorithm can do better unless P = NP.

Definitions

Definition

Submodular Function

A set function $f: 2^V \to \mathbb{R}$ on a finite ground set $V$ is submodular if and only if for all $A \subseteq B \subseteq V$ and $e \in V \setminus B$ :

$f(A \cup \{e\}) - f(A) \geq f(B \cup \{e\}) - f(B)$

The marginal gain of adding element $e$ decreases as the set grows. This is the diminishing returns property.

Equivalent characterization: $f$ is submodular if and only if for all $A, B \subseteq V$ :

$f(A) + f(B) \geq f(A \cup B) + f(A \cap B)$

Definition

Monotone Submodular Function

A submodular function $f$ is monotone if and only if $f(A) \leq f(B)$ whenever $A \subseteq B$ . Adding elements never decreases the objective. We also assume $f(\emptyset) = 0$ (normalized).

Examples of submodular functions:

Coverage: $f(S) = |\bigcup_{i \in S} C_i|$ where $C_i$ are sets. Monotone, submodular.
Entropy: $f(S) = H(X_S)$ where $X_S$ is a subset of random variables. Monotone, submodular.
Mutual information: $f(S) = I(X_S; Y)$ for prediction target $Y$ . Monotone, submodular under jointly Gaussian assumptions.
Graph cut function: $f(S) = |\{(u, v) \in E : u \in S, v \notin S\}|$ . Submodular but not monotone.

The Greedy Algorithm

The greedy algorithm for maximizing $f$ subject to $|S| \leq k$ :

Start with $S_0 = \emptyset$ .
For $i = 1, \ldots, k$ : add the element with the largest marginal gain, $e_i = \arg\max_{e \in V \setminus S_{i-1}} [f(S_{i-1} \cup \{e\}) - f(S_{i-1})]$ .
Return $S_k$ .

Each step makes $O(|V|)$ function evaluations. Total cost: $O(k \cdot |V|)$ function evaluations.

Theorem

Greedy Approximation for Monotone Submodular Maximization

Statement

Let $S^* = \arg\max_{|S| \leq k} f(S)$ be the optimal solution and $S_k$ be the greedy solution. Then:

$f(S_k) \geq \left(1 - \frac{1}{e}\right) f(S^*) \approx 0.632 \cdot f(S^*)$

Intuition

At each greedy step, the marginal gain is at least $\frac{1}{k}(f(S^*) - f(S_{i-1}))$ because the optimal set $S^*$ can improve $f$ by at most $f(S^*)$ total, and this improvement can be distributed across at most $k$ elements (by submodularity, the best single-element gain is at least $1/k$ of the remaining gap). The resulting recurrence $f(S^*) - f(S_i) \leq (1 - 1/k)(f(S^*) - f(S_{i-1}))$ gives $(1 - 1/k)^k \leq 1/e$ after $k$ steps.

Proof Sketch

Let $\Delta_i = f(S^*) - f(S_{i-1})$ be the remaining gap. Write $S^* = \{e_1^*, \ldots, e_k^*\}$ . By submodularity, $\sum_{j=1}^k [f(S_{i-1} \cup \{e_j^*\}) - f(S_{i-1})] \geq f(S_{i-1} \cup S^*) - f(S_{i-1}) \geq f(S^*) - f(S_{i-1}) = \Delta_i$ . So some element has marginal gain at least $\Delta_i / k$ . The greedy choice is at least this good: $f(S_i) - f(S_{i-1}) \geq \Delta_i / k$ . Thus $\Delta_{i+1} \leq (1 - 1/k) \Delta_i$ . After $k$ steps: $\Delta_k \leq (1 - 1/k)^k \Delta_0 \leq f(S^*)/e$ .

Why It Matters

A 63.2% approximation from a simple greedy algorithm is remarkable. The algorithm is easy to implement, runs in polynomial time, and works for any monotone submodular objective. No problem-specific structure is needed beyond submodularity.

Failure Mode

The $(1 - 1/e)$ bound is tight for the greedy algorithm. For non-monotone submodular functions, the greedy algorithm can perform arbitrarily badly; different algorithms (randomized greedy, double-greedy) are needed. For constraints beyond cardinality (matroid constraints, knapsack constraints), the approximation ratio changes.

report a correction →

Theorem

Hardness of Submodular Maximization

Statement

No polynomial-time algorithm can achieve an approximation ratio better than $(1 - 1/e)$ for maximizing a monotone submodular function subject to a cardinality constraint, unless P = NP.

Intuition

The max- $k$ -cover problem (a special case of monotone submodular maximization) is NP-hard, and the $(1 - 1/e)$ inapproximability for max- $k$ -cover carries over to the general submodular case.

Proof Sketch

Feige (1998) showed that max- $k$ -cover cannot be approximated beyond $(1 - 1/e + \epsilon)$ for any $\epsilon > 0$ under the assumption P $\neq$ NP. Since max- $k$ -cover is a special case of monotone submodular maximization, the hardness applies to the general problem.

Why It Matters

This means the greedy algorithm is optimally efficient: you cannot do better in polynomial time. There is no room for improvement unless you exploit additional structure beyond submodularity.

Failure Mode

The hardness result is for the worst case. For specific submodular functions with additional structure (e.g., graph-based, modular components), better approximation ratios or exact algorithms may exist.

report a correction →

Common Confusions

Watch Out

Submodularity is not convexity

Submodularity is sometimes called "discrete convexity," but this is misleading. A submodular function over sets is the discrete analogue of a concave function (not convex) when it comes to maximization: the greedy approach works for maximizing concave-like objectives. Submodular minimization is the analogue of convex minimization: it can be solved exactly in polynomial time via the Lovász extension.

Watch Out

The greedy algorithm does not work for non-monotone objectives

For non-monotone submodular maximization (e.g., graph cut), the standard greedy algorithm can fail completely. It may keep adding elements that individually look good but collectively produce a poor solution. Algorithms like randomized double greedy (Buchbinder et al. 2015) achieve a $1/2$ approximation for unconstrained non-monotone submodular maximization.

Canonical Examples

Example

Feature selection as submodular maximization

Given $n$ candidate features and labels $Y$ , define $f(S) = I(X_S; Y)$ , the mutual information between the selected features and the target. Under a jointly Gaussian model, $f$ is monotone submodular. Greedy feature selection: at each step, add the feature with the highest conditional mutual information $I(X_e; Y \mid X_S)$ . This achieves at least 63.2% of the optimal $k$ -feature mutual information.

Exercises

ExerciseCore

Problem

Prove that the coverage function $f(S) = |\bigcup_{i \in S} C_i|$ is submodular by verifying the diminishing returns property directly.

ExerciseAdvanced

Problem

Construct a monotone submodular function and a cardinality constraint $k$ where the greedy algorithm achieves exactly the $(1 - 1/e)$ ratio (or arbitrarily close to it).

References

Canonical:

Nemhauser, Wolsey, Fisher, "An analysis of approximations for maximizing submodular set functions" (1978)
Feige, "A threshold of ln n for approximating set cover" (1998)

Current:

Krause & Golovin, "Submodular Function Maximization" in Tractability (2014)
Buchbinder et al., "A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization" (2015)

Next Topics

Convex optimization basics: the continuous analogue of submodular minimization

Last reviewed: April 14, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

1

Greedy Algorithmslayer 0A · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.