Numerical Optimization
Submodular Optimization
Submodular functions exhibit diminishing returns. The greedy algorithm achieves a (1-1/e) approximation for monotone submodular maximization under cardinality constraints, with applications in feature selection, sensor placement, and data summarization.
Prerequisites
Why This Matters
Many ML problems require selecting a small subset from a large ground set: which features to include, which data points to label, where to place sensors, which documents to include in a summary. When the objective has the "diminishing returns" property (adding an element helps less when you already have a large set), it is submodular. Submodularity provides the theoretical guarantee that a simple greedy algorithm is near-optimal.
The approximation ratio is tight: no polynomial-time algorithm can do better unless P = NP.
Definitions
Submodular Function
A set function on a finite ground set is submodular if for all and :
The marginal gain of adding element decreases as the set grows. This is the diminishing returns property.
Equivalent characterization: is submodular if and only if for all :
Monotone Submodular Function
A submodular function is monotone if whenever . Adding elements never decreases the objective. We also assume (normalized).
Examples of submodular functions:
- Coverage: where are sets. Monotone, submodular.
- Entropy: where is a subset of random variables. Monotone, submodular.
- Mutual information: for prediction target . Monotone, submodular under jointly Gaussian assumptions.
- Graph cut function: . Submodular but not monotone.
The Greedy Algorithm
The greedy algorithm for maximizing subject to :
- Start with .
- For : add the element with the largest marginal gain, .
- Return .
Each step makes function evaluations. Total cost: function evaluations.
Greedy Approximation for Monotone Submodular Maximization
Statement
Let be the optimal solution and be the greedy solution. Then:
Intuition
At each greedy step, the marginal gain is at least because the optimal set can improve by at most total, and this improvement can be distributed across at most elements (by submodularity, the best single-element gain is at least of the remaining gap). The resulting recurrence gives after steps.
Proof Sketch
Let be the remaining gap. Write . By submodularity, . So some element has marginal gain at least . The greedy choice is at least this good: . Thus . After steps: .
Why It Matters
A 63.2% approximation from a simple greedy algorithm is remarkable. The algorithm is easy to implement, runs in polynomial time, and works for any monotone submodular objective. No problem-specific structure is needed beyond submodularity.
Failure Mode
The bound is tight for the greedy algorithm. For non-monotone submodular functions, the greedy algorithm can perform arbitrarily badly; different algorithms (randomized greedy, double-greedy) are needed. For constraints beyond cardinality (matroid constraints, knapsack constraints), the approximation ratio changes.
Hardness of Submodular Maximization
Statement
No polynomial-time algorithm can achieve an approximation ratio better than for maximizing a monotone submodular function subject to a cardinality constraint, unless P = NP.
Intuition
The max--cover problem (a special case of monotone submodular maximization) is NP-hard, and the inapproximability for max--cover carries over to the general submodular case.
Proof Sketch
Feige (1998) showed that max--cover cannot be approximated beyond for any under the assumption P NP. Since max--cover is a special case of monotone submodular maximization, the hardness applies to the general problem.
Why It Matters
This means the greedy algorithm is optimally efficient: you cannot do better in polynomial time. There is no room for improvement unless you exploit additional structure beyond submodularity.
Failure Mode
The hardness result is for the worst case. For specific submodular functions with additional structure (e.g., graph-based, modular components), better approximation ratios or exact algorithms may exist.
Common Confusions
Submodularity is not convexity
Submodularity is sometimes called "discrete convexity," but this is misleading. A submodular function over sets is the discrete analogue of a concave function (not convex) when it comes to maximization: the greedy approach works for maximizing concave-like objectives. Submodular minimization is the analogue of convex minimization: it can be solved exactly in polynomial time via the Lovász extension.
The greedy algorithm does not work for non-monotone objectives
For non-monotone submodular maximization (e.g., graph cut), the standard greedy algorithm can fail completely. It may keep adding elements that individually look good but collectively produce a poor solution. Algorithms like randomized double greedy (Buchbinder et al. 2015) achieve a approximation for unconstrained non-monotone submodular maximization.
Canonical Examples
Feature selection as submodular maximization
Given candidate features and labels , define , the mutual information between the selected features and the target. Under a jointly Gaussian model, is monotone submodular. Greedy feature selection: at each step, add the feature with the highest conditional mutual information . This achieves at least 63.2% of the optimal -feature mutual information.
Exercises
Problem
Prove that the coverage function is submodular by verifying the diminishing returns property directly.
Problem
Construct a monotone submodular function and a cardinality constraint where the greedy algorithm achieves exactly the ratio (or arbitrarily close to it).
References
Canonical:
- Nemhauser, Wolsey, Fisher, "An analysis of approximations for maximizing submodular set functions" (1978)
- Feige, "A threshold of ln n for approximating set cover" (1998)
Current:
- Krause & Golovin, "Submodular Function Maximization" in Tractability (2014)
- Buchbinder et al., "A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization" (2015)
Next Topics
- Convex optimization basics: the continuous analogue of submodular minimization
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Greedy AlgorithmsLayer 0A