Modern Generalization
Sparse Recovery and Compressed Sensing
Recover a sparse signal from far fewer measurements than its ambient dimension: the restricted isometry property, basis pursuit via L1 minimization, random measurement matrices, and applications from MRI to single-pixel cameras.
Prerequisites
Why This Matters
Classical signal processing says you need at least measurements to recover a signal in (this is the Nyquist-Shannon sampling theorem). But most real-world signals are sparse: they have only nonzero components in some basis. Compressed sensing shows that you can recover such signals from only random measurements, far fewer than .
This is not just a theoretical curiosity. Compressed sensing has transformed MRI (faster scans by acquiring fewer measurements), radar (better resolution with less bandwidth), and single-pixel cameras (imaging with one detector). The core mathematical idea, that sparsity plus random measurements enables recovery, is one of the most influential results in applied mathematics of the 21st century.
The connection to the Lasso is deep: compressed sensing and Lasso regression both exploit the same geometric principle, that L1 minimization promotes sparsity.
Mental Model
You have a high-dimensional signal that is sparse (only entries are nonzero). You cannot observe directly. Instead, you get linear measurements where is an measurement matrix with . The system is massively underdetermined: infinitely many signals are consistent with . But if you ask for the sparsest solution (or its convex relaxation, the L1-minimizing solution), you recover the true exactly, provided satisfies the right geometric property.
Formal Setup and Notation
Let be an unknown signal with at most nonzero entries ( is -sparse). We observe:
where is the measurement matrix and .
Sparsity
A vector is -sparse if , where counts the number of nonzero entries.
Basis Pursuit
Basis pursuit recovers by solving the convex optimization problem:
This is L1 minimization subject to the measurement constraint. It is a linear program and can be solved efficiently.
In the noisy case ( with ), the relaxed version is basis pursuit denoising (BPDN) or the Lasso:
Core Definitions
Restricted Isometry Property (RIP)
A matrix satisfies the restricted isometry property of order with constant if for every -sparse vector :
In words: approximately preserves the norm of every -sparse vector. The constant measures how far is from being an isometry on the set of -sparse vectors.
The RIP is the key geometric property. If satisfies RIP, then no two distinct sparse signals can produce the same measurements (because preserves distances between sparse vectors). This makes recovery possible.
RIP Constant
The RIP constant is the smallest value such that the RIP inequality holds for all -sparse vectors simultaneously. Smaller means is a better measurement matrix for sparse recovery.
Main Theorems
Exact Sparse Recovery via RIP and Basis Pursuit
Statement
If the measurement matrix satisfies the restricted isometry property of order with constant , then basis pursuit recovers every -sparse signal exactly:
where subject to .
Intuition
The RIP condition ensures that does not collapse the difference between any two sparse signals to zero. The L1 minimization then picks out the sparsest solution, which must be the true signal. The condition on is needed because the difference of two -sparse signals is -sparse.
Proof Sketch
Let be the recovery error. Since (both and satisfy the measurement constraint), RIP implies cannot be too concentrated on any set of coordinates. Meanwhile, L1 optimality of implies that the L1 norm of on the support of dominates the rest. These two facts together force . The detailed argument uses the "tube constraint" and the cone constraint from L1 optimality.
Why It Matters
This theorem says that L1 minimization (a computationally tractable convex program) can solve what seems like a combinatorial problem (finding the sparsest vector consistent with measurements). The key is RIP, which turns the intractable L0 minimization into the tractable L1 minimization.
Failure Mode
If is too large, basis pursuit can fail. Also, the RIP condition is a property of the matrix and is extremely expensive to verify for a given matrix (it requires checking all subsets of columns). In practice, we use random matrices where RIP holds with high probability.
Random Gaussian Matrices Satisfy RIP
Statement
Let have i.i.d. entries drawn from . If:
for a universal constant , then satisfies the restricted isometry property of order with constant with probability at least for some constant .
Intuition
Random matrices are good measurement matrices because they are "incoherent" with every sparse basis. A Gaussian random projection approximately preserves the geometry of any low-dimensional structure (Johnson-Lindenstrauss-like behavior). The key factor says you need measurements proportional to the sparsity times a logarithmic factor in the ambient dimension.
Proof Sketch
For a fixed -sparse vector , the random variable is concentrated around by standard concentration for chi-squared random variables (or subgaussian concentration). The challenge is uniformity over all -sparse vectors. This is handled by an epsilon-net argument: cover the set of unit-norm -sparse vectors by a finite net, apply concentration to each net point, then extend to the full set via a Lipschitz argument. The net has size , and taking the logarithm gives the measurement bound.
Why It Matters
This theorem makes compressed sensing practical. You do not need to design the measurement matrix carefully; a random Gaussian matrix works with high probability. The measurement complexity is nearly optimal (an information-theoretic lower bound shows measurements are necessary).
Failure Mode
The constant in the bound can be large, so for small the bound may not kick in. Also, Gaussian random matrices are dense ( memory and computation), which can be expensive. Structured random matrices (partial Fourier, random sparse matrices) provide similar guarantees with faster matrix-vector products.
Stable Recovery in the Noisy Case
Statement
Under the same RIP condition, basis pursuit denoising subject to satisfies:
for a constant depending only on . Recovery is stable: small noise produces small error.
Intuition
The RIP ensures that the measurement matrix is well-conditioned on sparse vectors. This means noise in the measurements translates to proportional (not amplified) error in the recovery. Without RIP, the underdetermined system would amplify noise catastrophically.
Proof Sketch
The proof extends the noiseless argument. The tube constraint from the measurement noise gives . Combined with the cone constraint from L1 optimality and the RIP, this bounds by a constant times .
Why It Matters
Real measurements always have noise. This theorem guarantees that compressed sensing is robust: the recovery error degrades gracefully with noise level. This is essential for practical applications like MRI.
Failure Mode
The bound is worst-case over all -sparse signals. For particular signals, the actual error may be much smaller. Also, if is only approximately sparse (compressible but not exactly -sparse), the error includes an additional term proportional to the "tail" of the sorted coefficients.
Connection to the Lasso
Compressed sensing and the Lasso solve closely related problems. The Lasso minimizes , which is the Lagrangian form of basis pursuit denoising. Under RIP-like conditions on the design matrix, the Lasso also recovers sparse signals.
The key difference is the context: in compressed sensing, you design the measurement matrix (often randomly). In the Lasso, the design matrix is given by the data. But the geometric principle is the same: L1 minimization promotes sparsity because the L1 ball has corners at the coordinate axes.
Why This Works: Geometry of L1
The L0 problem ( subject to ) is NP-hard in general. The L1 problem is its convex relaxation. Geometrically, the L1 ball is a cross-polytope with corners aligned with the coordinate axes. When you inflate the L1 ball until it touches the affine subspace , the contact point tends to be at a corner (a sparse vector) rather than a face (a dense vector). This is the geometric reason L1 promotes sparsity.
Applications
- MRI: acquire fewer Fourier measurements than the Nyquist rate, reconstruct using compressed sensing. This reduces scan time by 2-8x.
- Radar: recover sparse target scenes from limited bandwidth measurements.
- Single-pixel cameras: use a single photodetector with random modulation patterns to reconstruct an image.
- Genomics: identify a few causal genetic variants from high-dimensional genotype data.
Canonical Examples
Recovering a sparse signal from random measurements
Let and have nonzero entries at random positions. Generate with i.i.d. Gaussian entries and compute . With measurements ( of the ambient dimension), basis pursuit recovers exactly with high probability. The measurement budget measurements are theoretically sufficient, but the constant factor in practice requires somewhat larger.
Common Confusions
RIP is about the matrix, not the signal
The restricted isometry property is a condition on the measurement matrix , not on the signal . If satisfies RIP of sufficient order, then every sufficiently sparse signal can be recovered. You do not need to know which entries are nonzero in advance.
Compressed sensing does not violate Nyquist-Shannon
Nyquist-Shannon applies to bandlimited signals with no further structure. Compressed sensing exploits sparsity as additional structure. It is not that Nyquist is wrong; it is that sparsity provides prior information that allows recovery from fewer samples.
Checking RIP is hard, but you do not need to check it
Verifying whether a given matrix satisfies RIP is computationally intractable (it involves checking exponentially many subsets of columns). But you never need to verify RIP for a specific matrix. Instead, you use random matrices that satisfy RIP with high probability. The probabilistic guarantee suffices.
Summary
- Compressed sensing recovers -sparse signals in from random measurements
- RIP: the measurement matrix approximately preserves norms of all sparse vectors
- If satisfies RIP, basis pursuit (L1 minimization) recovers exactly
- Random Gaussian matrices satisfy RIP with high probability
- Recovery is stable under noise:
- L1 promotes sparsity because the L1 ball has corners at coordinate axes
Exercises
Problem
A signal is 50-sparse. Approximately how many Gaussian random measurements do you need for recovery via basis pursuit? Express your answer in terms of the theoretical bound .
Problem
Explain why L2 minimization ( subject to ) does not recover sparse signals, even when L1 minimization does. Give a geometric argument.
Problem
Prove that measurements are necessary for exact recovery of all -sparse signals in , regardless of the measurement matrix and the recovery algorithm. This shows the upper bound is tight up to the logarithmic factor.
References
Canonical:
- Candes, Romberg, Tao, "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information" (2006)
- Donoho, "Compressed sensing" (IEEE Trans. Info. Theory, 2006)
- Candes & Tao, "Decoding by linear programming" (2005)
Current:
- Foucart & Rauhut, A Mathematical Introduction to Compressive Sensing (2013)
- Eldar & Kutyniok, Compressed Sensing: Theory and Applications (2012)
Next Topics
The natural next step from compressed sensing:
- Minimax lower bounds: information-theoretic limits on sparse estimation rates
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Lasso RegressionLayer 2
- Linear RegressionLayer 1
- Matrix Operations and PropertiesLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Maximum Likelihood EstimationLayer 0B
- Common Probability DistributionsLayer 0A
- Differentiation in RnLayer 0A
- Convex Optimization BasicsLayer 1
- Sub-Gaussian Random VariablesLayer 2
- Concentration InequalitiesLayer 1
- Expectation, Variance, Covariance, and MomentsLayer 0A
Builds on This
- Restricted Isometry PropertyLayer 3