ML Methods
NMF (Nonnegative Matrix Factorization)
Factor V into W*H with all entries nonnegative: parts-based additive representation, multiplicative update rules, and applications to topic modeling and image decomposition.
Prerequisites
Why This Matters
Given a data matrix where all entries are nonnegative (e.g., word counts, pixel intensities, gene expression levels), NMF finds a low-rank factorization where both factors are also nonnegative: and .
The nonnegativity constraint changes the nature of the factorization. PCA allows negative values, so principal components can cancel each other: a face might be represented as "average face minus nose plus eyes." NMF forces an additive, parts-based representation: a face is represented as "some amount of nose + some amount of eyes + some amount of mouth." Each component adds to the reconstruction; nothing subtracts.
This parts-based interpretation makes NMF particularly useful for topic modeling (each topic is a nonnegative combination of words), image analysis (each component is a recognizable part), and genomics (each factor represents a biological process with nonnegative expression).
Mental Model
Think of NMF as decomposing a recipe into ingredients. A dish is a nonnegative combination of ingredients (you cannot add negative flour). Each column of specifies an "ingredient" (a basis pattern), and each row of specifies how much of each ingredient goes into each observation. Every observation is built by adding ingredients together, never subtracting.
Formal Setup
Nonnegative Matrix Factorization
Given and a target rank , NMF finds:
where , , and is the Frobenius norm. The constraint means all entries are nonnegative.
Divergence-Based NMF
An alternative objective uses generalized KL divergence instead of squared error:
This objective is appropriate when the data has Poisson-like statistics (e.g., count data). The choice of objective affects the update rules but not the structural properties of the factorization.
Multiplicative Update Rules
Lee and Seung (1999, 2001) introduced multiplicative update rules that maintain nonnegativity automatically. For the Frobenius norm objective:
These updates preserve nonnegativity automatically: since , , and are all nonnegative, the numerators and denominators are nonnegative. Multiplying a nonnegative value by a nonnegative ratio keeps it nonnegative. No projection step is needed.
Main Theorems
Monotonic Convergence of Multiplicative Updates
Statement
The multiplicative update rules for and are non-increasing under the Frobenius norm objective:
for every iteration . The objective converges to a fixed point of the update equations. Every limit point of the sequence is a stationary point of the constrained optimization problem.
Intuition
Each multiplicative update can be derived as a majorization-minimization (MM) step. The update constructs an auxiliary function that upper-bounds the objective and touches it at the current point. Minimizing the auxiliary function (which has a closed-form solution) is guaranteed to decrease the original objective. The multiplicative form arises from the specific choice of auxiliary function.
Proof Sketch
Define the auxiliary function that satisfies with equality when . The auxiliary function is constructed using the inequality (a bound from the concavity of the log). Minimizing over gives the multiplicative update rule. Monotonic decrease of the objective follows from the MM framework.
Why It Matters
Monotonic convergence guarantees that the algorithm does not oscillate or diverge. Combined with the automatic maintenance of nonnegativity, the multiplicative updates make NMF simple to implement and reliable in practice. No step-size tuning is required (unlike gradient descent).
Failure Mode
Convergence is to a stationary point, not a global minimum. The NMF objective is nonconvex (bilinear in and ), so different initializations lead to different solutions. Multiple restarts with random initialization are standard practice. Additionally, if any entry of or reaches exactly zero, it stays at zero forever (the multiplicative update cannot escape zero). This can cause premature convergence to suboptimal solutions.
NMF vs. PCA
| Property | PCA | NMF |
|---|---|---|
| Sign constraint | None (components can be negative) | All nonnegative |
| Interpretation | Orthogonal directions of variance | Additive parts |
| Uniqueness | Unique (up to sign/rotation) | Not unique in general |
| Objective | Maximize variance (eigenvalue problem) | Minimize reconstruction error (iterative) |
| Components | Orthogonal | Not necessarily orthogonal |
| Cancellation | Components can cancel each other | Components only add |
The key difference: PCA finds directions of maximum variance and produces components that are globally ordered by explained variance. NMF finds nonnegative parts that are locally interpretable. PCA components often mix positive and negative contributions, making individual components hard to interpret. NMF components are always additive and typically correspond to recognizable parts or patterns.
Applications
Topic modeling: is a document-term matrix (documents words). is a document-topic matrix. is a topic-word matrix. Each topic (row of ) is a nonnegative distribution over words. Each document (row of ) is a nonnegative mixture of topics. NMF discovers topics without negative word weights, producing directly interpretable results.
Image decomposition: is a matrix of face images (each row is a flattened image). NMF learns parts: noses, eyes, mouths, and other facial features as nonnegative basis images. Reconstruction is additive overlay of these parts.
Gene expression: is a genes-by-samples matrix. NMF identifies metagenes (groups of co-expressed genes) and their activation levels across samples. The nonnegativity constraint is natural because gene expression levels are nonnegative.
NMF does not always produce parts-based representations
The parts-based property depends on the data and the rank . If is too large, the factors may not be interpretable. If the data does not have a natural parts-based structure, NMF will still factorize it, but the components may not correspond to meaningful parts. The parts-based interpretation is an empirical observation, not a guarantee.
NMF solutions are not unique
Unlike PCA (where eigenvectors are unique up to sign), NMF solutions depend on initialization. For any solution , the pair is also a solution for any nonnegative diagonal matrix . Additional constraints (sparsity, orthogonality) can improve uniqueness, but the basic NMF problem has multiple global optima.
Summary
- NMF factors with all entries nonnegative, producing additive parts-based representations
- Multiplicative update rules maintain nonnegativity automatically and converge monotonically
- The objective is nonconvex; different initializations give different solutions
- NMF components only add, never subtract; PCA components can cancel
- Applications: topic modeling, image parts decomposition, gene expression analysis
- If an entry of or reaches zero, it stays at zero permanently
Exercises
Problem
A document-term matrix has 1000 documents and 5000 words. You run NMF with rank . What are the dimensions of and , and what does each row of represent?
Problem
The multiplicative update for is . Show that if is held fixed, this update is equivalent to a scaled gradient descent step with a specific step size. What is the step size?
Problem
Standard NMF is nonconvex and has multiple local minima. Under what conditions on and is the NMF factorization unique (up to scaling and permutation)? State the separability condition and explain its geometric meaning.
References
Canonical:
- Lee & Seung, "Learning the parts of objects by non-negative matrix factorization" (Nature, 1999)
- Lee & Seung, "Algorithms for Non-negative Matrix Factorization" (NeurIPS, 2001)
Current:
-
Gillis, Nonnegative Matrix Factorization (SIAM, 2020)
-
Arora et al., "Computing a Nonnegative Matrix Factorization -- Provably" (STOC, 2012)
-
Hastie, Tibshirani, Friedman, The Elements of Statistical Learning (2009), Chapters 3-15
-
Bishop, Pattern Recognition and Machine Learning (2006), Chapters 1-14
Next Topics
The natural next steps from NMF:
- Principal component analysis: the unconstrained analog of NMF, allowing negative values and providing orthogonal components
- K-means clustering: NMF with additional constraints (binary ) reduces to k-means
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Eigenvalues and EigenvectorsLayer 0A
- Matrix Operations and PropertiesLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A