Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Modern Generalization

Sparse Recovery and Compressed Sensing

Recover a sparse signal from far fewer measurements than its ambient dimension: the restricted isometry property, basis pursuit via L1 minimization, random measurement matrices, and applications from MRI to single-pixel cameras.

AdvancedTier 3Stable~60 min
0

Why This Matters

Classical signal processing says you need at least nn measurements to recover a signal in Rn\mathbb{R}^n (this is the Nyquist-Shannon sampling theorem). But most real-world signals are sparse: they have only sns \ll n nonzero components in some basis. Compressed sensing shows that you can recover such signals from only m=O(slog(n/s))m = O(s \log(n/s)) random measurements, far fewer than nn.

This is not just a theoretical curiosity. Compressed sensing has transformed MRI (faster scans by acquiring fewer measurements), radar (better resolution with less bandwidth), and single-pixel cameras (imaging with one detector). The core mathematical idea, that sparsity plus random measurements enables recovery, is one of the most influential results in applied mathematics of the 21st century.

The connection to the Lasso is deep: compressed sensing and Lasso regression both exploit the same geometric principle, that L1 minimization promotes sparsity.

Mental Model

You have a high-dimensional signal xRnx \in \mathbb{R}^n that is sparse (only ss entries are nonzero). You cannot observe xx directly. Instead, you get mm linear measurements y=Axy = Ax where AA is an m×nm \times n measurement matrix with mnm \ll n. The system is massively underdetermined: infinitely many signals xx are consistent with yy. But if you ask for the sparsest solution (or its convex relaxation, the L1-minimizing solution), you recover the true xx exactly, provided AA satisfies the right geometric property.

Formal Setup and Notation

Let xRnx \in \mathbb{R}^n be an unknown signal with at most ss nonzero entries (xx is ss-sparse). We observe:

y=Axy = Ax

where ARm×nA \in \mathbb{R}^{m \times n} is the measurement matrix and mnm \ll n.

Definition

Sparsity

A vector xRnx \in \mathbb{R}^n is ss-sparse if x0s\|x\|_0 \leq s, where x0={i:xi0}\|x\|_0 = |\{i : x_i \neq 0\}| counts the number of nonzero entries.

Definition

Basis Pursuit

Basis pursuit recovers xx by solving the convex optimization problem:

x^=argminzRnz1subject toAz=y\hat{x} = \arg\min_{z \in \mathbb{R}^n} \|z\|_1 \quad \text{subject to} \quad Az = y

This is L1 minimization subject to the measurement constraint. It is a linear program and can be solved efficiently.

In the noisy case (y=Ax+ey = Ax + e with e2ϵ\|e\|_2 \leq \epsilon), the relaxed version is basis pursuit denoising (BPDN) or the Lasso:

x^=argminzz1subject toAzy2ϵ\hat{x} = \arg\min_{z} \|z\|_1 \quad \text{subject to} \quad \|Az - y\|_2 \leq \epsilon

Core Definitions

Definition

Restricted Isometry Property (RIP)

A matrix ARm×nA \in \mathbb{R}^{m \times n} satisfies the restricted isometry property of order ss with constant δs(0,1)\delta_s \in (0,1) if for every ss-sparse vector xx:

(1δs)x22Ax22(1+δs)x22(1 - \delta_s)\|x\|_2^2 \leq \|Ax\|_2^2 \leq (1 + \delta_s)\|x\|_2^2

In words: AA approximately preserves the norm of every ss-sparse vector. The constant δs\delta_s measures how far AA is from being an isometry on the set of ss-sparse vectors.

The RIP is the key geometric property. If AA satisfies RIP, then no two distinct sparse signals can produce the same measurements (because AA preserves distances between sparse vectors). This makes recovery possible.

Definition

RIP Constant

The RIP constant δs\delta_s is the smallest value such that the RIP inequality holds for all ss-sparse vectors simultaneously. Smaller δs\delta_s means AA is a better measurement matrix for sparse recovery.

Main Theorems

Theorem

Exact Sparse Recovery via RIP and Basis Pursuit

Statement

If the measurement matrix AA satisfies the restricted isometry property of order 2s2s with constant δ2s<210.4142\delta_{2s} < \sqrt{2} - 1 \approx 0.4142, then basis pursuit recovers every ss-sparse signal exactly:

x^=x\hat{x} = x

where x^=argminzz1\hat{x} = \arg\min_z \|z\|_1 subject to Az=yAz = y.

Intuition

The RIP condition ensures that AA does not collapse the difference between any two sparse signals to zero. The L1 minimization then picks out the sparsest solution, which must be the true signal. The condition on δ2s\delta_{2s} is needed because the difference of two ss-sparse signals is 2s2s-sparse.

Proof Sketch

Let h=x^xh = \hat{x} - x be the recovery error. Since Ah=0Ah = 0 (both x^\hat{x} and xx satisfy the measurement constraint), RIP implies hh cannot be too concentrated on any set of 2s2s coordinates. Meanwhile, L1 optimality of x^\hat{x} implies that the L1 norm of hh on the support of xx dominates the rest. These two facts together force h=0h = 0. The detailed argument uses the "tube constraint" and the cone constraint from L1 optimality.

Why It Matters

This theorem says that L1 minimization (a computationally tractable convex program) can solve what seems like a combinatorial problem (finding the sparsest vector consistent with measurements). The key is RIP, which turns the intractable L0 minimization into the tractable L1 minimization.

Failure Mode

If δ2s\delta_{2s} is too large, basis pursuit can fail. Also, the RIP condition is a property of the matrix AA and is extremely expensive to verify for a given matrix (it requires checking all (n2s)\binom{n}{2s} subsets of columns). In practice, we use random matrices where RIP holds with high probability.

Theorem

Random Gaussian Matrices Satisfy RIP

Statement

Let ARm×nA \in \mathbb{R}^{m \times n} have i.i.d. entries drawn from N(0,1/m)\mathcal{N}(0, 1/m). If:

mCslog(n/s)m \geq C \cdot s \cdot \log(n/s)

for a universal constant C>0C > 0, then AA satisfies the restricted isometry property of order ss with constant δsδ\delta_s \leq \delta with probability at least 12ecm1 - 2e^{-c \cdot m} for some constant c>0c > 0.

Intuition

Random matrices are good measurement matrices because they are "incoherent" with every sparse basis. A Gaussian random projection approximately preserves the geometry of any low-dimensional structure (Johnson-Lindenstrauss-like behavior). The key factor slog(n/s)s \log(n/s) says you need measurements proportional to the sparsity times a logarithmic factor in the ambient dimension.

Proof Sketch

For a fixed ss-sparse vector xx, the random variable Ax22\|Ax\|_2^2 is concentrated around x22\|x\|_2^2 by standard concentration for chi-squared random variables (or subgaussian concentration). The challenge is uniformity over all ss-sparse vectors. This is handled by an epsilon-net argument: cover the set of unit-norm ss-sparse vectors by a finite net, apply concentration to each net point, then extend to the full set via a Lipschitz argument. The net has size (ns)(C/ϵ)s\binom{n}{s} \cdot (C/\epsilon)^s, and taking the logarithm gives the slog(n/s)s \log(n/s) measurement bound.

Why It Matters

This theorem makes compressed sensing practical. You do not need to design the measurement matrix carefully; a random Gaussian matrix works with high probability. The measurement complexity O(slog(n/s))O(s \log(n/s)) is nearly optimal (an information-theoretic lower bound shows m2sm \geq 2s measurements are necessary).

Failure Mode

The constant CC in the bound can be large, so for small mm the bound may not kick in. Also, Gaussian random matrices are dense (O(mn)O(mn) memory and computation), which can be expensive. Structured random matrices (partial Fourier, random sparse matrices) provide similar guarantees with faster matrix-vector products.

Theorem

Stable Recovery in the Noisy Case

Statement

Under the same RIP condition, basis pursuit denoising x^=argminz1\hat{x} = \arg\min \|z\|_1 subject to Azy2ϵ\|Az - y\|_2 \leq \epsilon satisfies:

x^x2Cϵ\|\hat{x} - x\|_2 \leq C' \cdot \epsilon

for a constant CC' depending only on δ2s\delta_{2s}. Recovery is stable: small noise produces small error.

Intuition

The RIP ensures that the measurement matrix is well-conditioned on sparse vectors. This means noise in the measurements translates to proportional (not amplified) error in the recovery. Without RIP, the underdetermined system would amplify noise catastrophically.

Proof Sketch

The proof extends the noiseless argument. The tube constraint from the measurement noise gives Ah22ϵ\|Ah\|_2 \leq 2\epsilon. Combined with the cone constraint from L1 optimality and the RIP, this bounds h2\|h\|_2 by a constant times ϵ\epsilon.

Why It Matters

Real measurements always have noise. This theorem guarantees that compressed sensing is robust: the recovery error degrades gracefully with noise level. This is essential for practical applications like MRI.

Failure Mode

The bound is worst-case over all ss-sparse signals. For particular signals, the actual error may be much smaller. Also, if xx is only approximately sparse (compressible but not exactly ss-sparse), the error includes an additional term proportional to the "tail" of the sorted coefficients.

Connection to the Lasso

Compressed sensing and the Lasso solve closely related problems. The Lasso minimizes 12Axy22+λx1\frac{1}{2}\|Ax - y\|_2^2 + \lambda \|x\|_1, which is the Lagrangian form of basis pursuit denoising. Under RIP-like conditions on the design matrix, the Lasso also recovers sparse signals.

The key difference is the context: in compressed sensing, you design the measurement matrix AA (often randomly). In the Lasso, the design matrix is given by the data. But the geometric principle is the same: L1 minimization promotes sparsity because the L1 ball has corners at the coordinate axes.

Why This Works: Geometry of L1

The L0 problem (minx0\min \|x\|_0 subject to Ax=yAx = y) is NP-hard in general. The L1 problem is its convex relaxation. Geometrically, the L1 ball is a cross-polytope with corners aligned with the coordinate axes. When you inflate the L1 ball until it touches the affine subspace {z:Az=y}\{z : Az = y\}, the contact point tends to be at a corner (a sparse vector) rather than a face (a dense vector). This is the geometric reason L1 promotes sparsity.

Applications

  • MRI: acquire fewer Fourier measurements than the Nyquist rate, reconstruct using compressed sensing. This reduces scan time by 2-8x.
  • Radar: recover sparse target scenes from limited bandwidth measurements.
  • Single-pixel cameras: use a single photodetector with random modulation patterns to reconstruct an image.
  • Genomics: identify a few causal genetic variants from high-dimensional genotype data.

Canonical Examples

Example

Recovering a sparse signal from random measurements

Let n=1000n = 1000 and xx have s=10s = 10 nonzero entries at random positions. Generate ARm×1000A \in \mathbb{R}^{m \times 1000} with i.i.d. Gaussian entries and compute y=Axy = Ax. With m=100m = 100 measurements (10%10\% of the ambient dimension), basis pursuit recovers xx exactly with high probability. The measurement budget mslog(n/s)10log(100)46m \approx s \log(n/s) \approx 10 \cdot \log(100) \approx 46 measurements are theoretically sufficient, but the constant factor in practice requires mm somewhat larger.

Common Confusions

Watch Out

RIP is about the matrix, not the signal

The restricted isometry property is a condition on the measurement matrix AA, not on the signal xx. If AA satisfies RIP of sufficient order, then every sufficiently sparse signal can be recovered. You do not need to know which entries are nonzero in advance.

Watch Out

Compressed sensing does not violate Nyquist-Shannon

Nyquist-Shannon applies to bandlimited signals with no further structure. Compressed sensing exploits sparsity as additional structure. It is not that Nyquist is wrong; it is that sparsity provides prior information that allows recovery from fewer samples.

Watch Out

Checking RIP is hard, but you do not need to check it

Verifying whether a given matrix satisfies RIP is computationally intractable (it involves checking exponentially many subsets of columns). But you never need to verify RIP for a specific matrix. Instead, you use random matrices that satisfy RIP with high probability. The probabilistic guarantee suffices.

Summary

  • Compressed sensing recovers ss-sparse signals in Rn\mathbb{R}^n from m=O(slog(n/s))m = O(s \log(n/s)) random measurements
  • RIP: the measurement matrix approximately preserves norms of all sparse vectors
  • If AA satisfies RIP, basis pursuit (L1 minimization) recovers xx exactly
  • Random Gaussian matrices satisfy RIP with high probability
  • Recovery is stable under noise: x^x2Cϵ\|\hat{x} - x\|_2 \leq C \epsilon
  • L1 promotes sparsity because the L1 ball has corners at coordinate axes

Exercises

ExerciseCore

Problem

A signal xR10000x \in \mathbb{R}^{10000} is 50-sparse. Approximately how many Gaussian random measurements do you need for recovery via basis pursuit? Express your answer in terms of the theoretical bound m=O(slog(n/s))m = O(s \log(n/s)).

ExerciseAdvanced

Problem

Explain why L2 minimization (minz2\min \|z\|_2 subject to Az=yAz = y) does not recover sparse signals, even when L1 minimization does. Give a geometric argument.

ExerciseResearch

Problem

Prove that m2sm \geq 2s measurements are necessary for exact recovery of all ss-sparse signals in Rn\mathbb{R}^n, regardless of the measurement matrix AA and the recovery algorithm. This shows the O(slog(n/s))O(s \log(n/s)) upper bound is tight up to the logarithmic factor.

References

Canonical:

  • Candes, Romberg, Tao, "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information" (2006)
  • Donoho, "Compressed sensing" (IEEE Trans. Info. Theory, 2006)
  • Candes & Tao, "Decoding by linear programming" (2005)

Current:

  • Foucart & Rauhut, A Mathematical Introduction to Compressive Sensing (2013)
  • Eldar & Kutyniok, Compressed Sensing: Theory and Applications (2012)

Next Topics

The natural next step from compressed sensing:

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics