Sparse Recovery and Compressed Sensing

Sneiderman, Robby

Modern Generalization

Sparse Recovery and Compressed Sensing

Recover a sparse signal from far fewer measurements than its ambient dimension: the restricted isometry property, basis pursuit via L1 minimization, random measurement matrices, and applications from MRI to single-pixel cameras.

AdvancedTier 3StableSupporting~60 min

Prerequisites

Lasso Regression Subgaussian Random Variables Sparse Coding and Efficient Coding

Prereq Map

Learning position

Read this page in the graph.

modern-generalization | layer 4 | tier 3. This page has 2 direct prerequisites and 1 published dependent.

Open Atlas Prerequisites Leads to

What next

Minimax Lower Bounds: Le Cam, Fano, Assouad, and the Reduction to Testing

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Without structural assumptions, $m < n$ linear measurements cannot identify an arbitrary vector in $\mathbb{R}^n$ — this is a basic linear-algebra / information-dimension fact, not the Nyquist–Shannon sampling theorem (which is a sampling result for bandlimited continuous-time signals, not a statement about finite-dimensional linear measurement). Most real-world signals, however, are sparse: they have only $s \ll n$ nonzero components in some basis. Compressed sensing shows that when you assume sparsity and use incoherent / RIP-like measurement matrices, you can recover such signals from only $m = O(s \log(n/s))$ measurements. This does not contradict Nyquist–Shannon; it exploits structural assumptions on the signal class that classical sampling theory does not use.

This is not just a theoretical curiosity. Compressed sensing has transformed MRI (faster scans by acquiring fewer measurements), radar (better resolution with less bandwidth), and single-pixel cameras (imaging with one detector). The core mathematical idea, that sparsity plus random measurements enables recovery, is one of the most influential results in applied mathematics of the 21st century.

The connection to the Lasso is deep: compressed sensing and Lasso regression both exploit the same geometric principle, that L1 minimization promotes sparsity.

Mental Model

You have a high-dimensional signal $x \in \mathbb{R}^n$ that is sparse (only $s$ entries are nonzero). You cannot observe $x$ directly. Instead, you get $m$ linear measurements $y = Ax$ where $A$ is an $m \times n$ measurement matrix with $m \ll n$ . The system is massively underdetermined: infinitely many signals $x$ are consistent with $y$ . But if you ask for the sparsest solution (or its convex relaxation, the L1-minimizing solution), you recover the true $x$ exactly, provided $A$ satisfies the right geometric property.

Formal Setup and Notation

Let $x \in \mathbb{R}^n$ be an unknown signal with at most $s$ nonzero entries ( $x$ is $s$ -sparse). We observe:

$y = Ax$

where $A \in \mathbb{R}^{m \times n}$ is the measurement matrix and $m \ll n$ .

Definition

Sparsity

A vector $x \in \mathbb{R}^n$ is $s$ -sparse if $\|x\|_0 \leq s$ , where $\|x\|_0 = |\{i : x_i \neq 0\}|$ counts the number of nonzero entries.

Definition

Basis Pursuit

Basis pursuit recovers $x$ by solving the convex optimization problem:

$\hat{x} = \arg\min_{z \in \mathbb{R}^n} \|z\|_1 \quad \text{subject to} \quad Az = y$

This is L1 minimization subject to the measurement constraint. It is a linear program and can be solved efficiently.

In the noisy case ( $y = Ax + e$ with $\|e\|_2 \leq \epsilon$ ), the relaxed version is basis pursuit denoising (BPDN) or the Lasso:

$\hat{x} = \arg\min_{z} \|z\|_1 \quad \text{subject to} \quad \|Az - y\|_2 \leq \epsilon$

Core Definitions

Definition

Restricted Isometry Property (RIP) $δ_{s}$

A matrix $A \in \mathbb{R}^{m \times n}$ satisfies the restricted isometry property of order $s$ with constant $\delta_s \in (0,1)$ if for every $s$ -sparse vector $x$ :

$(1 - \delta_s)\|x\|_2^2 \leq \|Ax\|_2^2 \leq (1 + \delta_s)\|x\|_2^2$

In words: $A$ approximately preserves the norm of every $s$ -sparse vector. The constant $\delta_s$ measures how far $A$ is from being an isometry on the set of $s$ -sparse vectors.

The RIP is the key geometric property. If $A$ satisfies RIP, then no two distinct sparse signals can produce the same measurements (because $A$ preserves distances between sparse vectors). This makes recovery possible.

Definition

RIP Constant

The RIP constant $\delta_s$ is the smallest value such that the RIP inequality holds for all $s$ -sparse vectors simultaneously. Smaller $\delta_s$ means $A$ is a better measurement matrix for sparse recovery.

Main Theorems

Theorem

Exact Sparse Recovery via RIP and Basis Pursuit

Statement

If the measurement matrix $A$ satisfies the restricted isometry property of order $2s$ with constant $\delta_{2s} < \sqrt{2} - 1 \approx 0.4142$ , then basis pursuit recovers every $s$ -sparse signal exactly:

$\hat{x} = x$

where $\hat{x} = \arg\min_z \|z\|_1$ subject to $Az = y$ .

Intuition

The RIP condition ensures that $A$ does not collapse the difference between any two sparse signals to zero. The L1 minimization then picks out the sparsest solution, which must be the true signal. The condition on $\delta_{2s}$ is needed because the difference of two $s$ -sparse signals is $2s$ -sparse.

Proof Sketch

Let $h = \hat{x} - x$ be the recovery error. Since $Ah = 0$ (both $\hat{x}$ and $x$ satisfy the measurement constraint), RIP implies $h$ cannot be too concentrated on any set of $2s$ coordinates. Meanwhile, L1 optimality of $\hat{x}$ implies that the L1 norm of $h$ on the support of $x$ dominates the rest. These two facts together force $h = 0$ . The detailed argument uses the "tube constraint" and the cone constraint from L1 optimality.

Why It Matters

This theorem says that L1 minimization (a computationally tractable convex program) can solve what seems like a combinatorial problem (finding the sparsest vector consistent with measurements). The key is RIP, which turns the intractable L0 minimization into the tractable L1 minimization.

Failure Mode

If $\delta_{2s}$ is too large, basis pursuit can fail. Also, the RIP condition is a property of the matrix $A$ and is extremely expensive to verify for a given matrix (it requires checking all $\binom{n}{2s}$ subsets of columns). In practice, we use random matrices where RIP holds with high probability.

report a correction →

Theorem

Random Gaussian Matrices Satisfy RIP

Statement

Let $A \in \mathbb{R}^{m \times n}$ have i.i.d. entries drawn from $\mathcal{N}(0, 1/m)$ . If:

$m \geq C \cdot s \cdot \log(n/s)$

for a universal constant $C > 0$ , then $A$ satisfies the restricted isometry property of order $s$ with constant $\delta_s \leq \delta$ with probability at least $1 - 2e^{-c \cdot m}$ for some constant $c > 0$ .

Intuition

Random matrices are good measurement matrices because they are "incoherent" with every sparse basis. A Gaussian random projection approximately preserves the geometry of any low-dimensional structure (Johnson-Lindenstrauss-like behavior). The key factor $s \log(n/s)$ says you need measurements proportional to the sparsity times a logarithmic factor in the ambient dimension.

Proof Sketch

For a fixed $s$ -sparse vector $x$ , the random variable $\|Ax\|_2^2$ is concentrated around $\|x\|_2^2$ by standard concentration for chi-squared random variables (or subgaussian concentration). The challenge is uniformity over all $s$ -sparse vectors. This is handled by an epsilon-net argument: cover the set of unit-norm $s$ -sparse vectors by a finite net, apply concentration to each net point, then extend to the full set via a Lipschitz argument. The net has size $\binom{n}{s} \cdot (C/\epsilon)^s$ , and taking the logarithm gives the $s \log(n/s)$ measurement bound.

Why It Matters

This theorem makes compressed sensing practical. You do not need to design the measurement matrix carefully; a random Gaussian matrix works with high probability. The measurement complexity $O(s \log(n/s))$ is nearly optimal (an information-theoretic lower bound shows $m \geq 2s$ measurements are necessary).

Failure Mode

The constant $C$ in the bound can be large, so for small $m$ the bound may not kick in. Also, Gaussian random matrices are dense ( $O(mn)$ memory and computation), which can be expensive. Structured random matrices (partial Fourier, random sparse matrices) provide similar guarantees with faster matrix-vector products.

report a correction →

Theorem

Stable Recovery in the Noisy Case

Statement

Under the same RIP condition, basis pursuit denoising $\hat{x} = \arg\min \|z\|_1$ subject to $\|Az - y\|_2 \leq \epsilon$ satisfies:

$\|\hat{x} - x\|_2 \leq C' \cdot \epsilon$

for a constant $C'$ depending only on $\delta_{2s}$ . Recovery is stable: small noise produces small error.

Intuition

The RIP ensures that the measurement matrix is well-conditioned on sparse vectors. This means noise in the measurements translates to proportional (not amplified) error in the recovery. Without RIP, the underdetermined system would amplify noise catastrophically.

Proof Sketch

The proof extends the noiseless argument. The tube constraint from the measurement noise gives $\|Ah\|_2 \leq 2\epsilon$ . Combined with the cone constraint from L1 optimality and the RIP, this bounds $\|h\|_2$ by a constant times $\epsilon$ .

Why It Matters

Real measurements always have noise. This theorem guarantees that compressed sensing is robust: the recovery error degrades gracefully with noise level. This is essential for practical applications like MRI.

Failure Mode

The bound is worst-case over all $s$ -sparse signals. For particular signals, the actual error may be much smaller. Also, if $x$ is only approximately sparse (compressible but not exactly $s$ -sparse), the error includes an additional term proportional to the "tail" of the sorted coefficients.

report a correction →

Connection to the Lasso

Compressed sensing and the Lasso solve closely related problems. The Lasso minimizes $\frac{1}{2}\|Ax - y\|_2^2 + \lambda \|x\|_1$ , which is the Lagrangian form of basis pursuit denoising. Under RIP-like conditions on the design matrix, the Lasso also recovers sparse signals.

The key difference is the context: in compressed sensing, you design the measurement matrix $A$ (often randomly). In the Lasso, the design matrix is given by the data. But the geometric principle is the same: L1 minimization promotes sparsity because the L1 ball has corners at the coordinate axes.

Why This Works: Geometry of L1

The L0 problem ( $\min \|x\|_0$ subject to $Ax = y$ ) is NP-hard in general. The L1 problem is its convex relaxation. Geometrically, the L1 ball is a cross-polytope with corners aligned with the coordinate axes. When you inflate the L1 ball until it touches the affine subspace $\{z : Az = y\}$ , the contact point tends to be at a corner (a sparse vector) rather than a face (a dense vector). This is the geometric reason L1 promotes sparsity.

Applications

MRI: acquire fewer Fourier measurements than the Nyquist rate, reconstruct using compressed sensing. This reduces scan time by 2-8x.
Radar: recover sparse target scenes from limited bandwidth measurements.
Single-pixel cameras: use a single photodetector with random modulation patterns to reconstruct an image.
Genomics: identify a few causal genetic variants from high-dimensional genotype data.

Canonical Examples

Example

Recovering a sparse signal from random measurements

Let $n = 1000$ and $x$ have $s = 10$ nonzero entries at random positions. Generate $A \in \mathbb{R}^{m \times 1000}$ with i.i.d. Gaussian entries and compute $y = Ax$ . With $m = 100$ measurements ( $10\%$ of the ambient dimension), basis pursuit recovers $x$ exactly with high probability. The measurement budget $m \approx s \log(n/s) \approx 10 \cdot \log(100) \approx 46$ measurements are theoretically sufficient, but the constant factor in practice requires $m$ somewhat larger.

Common Confusions

Watch Out

RIP is about the matrix, not the signal

The restricted isometry property is a condition on the measurement matrix $A$ , not on the signal $x$ . If $A$ satisfies RIP of sufficient order, then every sufficiently sparse signal can be recovered. You do not need to know which entries are nonzero in advance.

Watch Out

Compressed sensing does not violate Nyquist-Shannon

Nyquist-Shannon applies to bandlimited signals with no further structure. Compressed sensing exploits sparsity as additional structure. It is not that Nyquist is wrong; it is that sparsity provides prior information that allows recovery from fewer samples.

Watch Out

Checking RIP is hard, but you do not need to check it

Verifying whether a given matrix satisfies RIP is computationally intractable (it involves checking exponentially many subsets of columns). But you never need to verify RIP for a specific matrix. Instead, you use random matrices that satisfy RIP with high probability. The probabilistic guarantee suffices.

Summary

Compressed sensing recovers $s$ -sparse signals in $\mathbb{R}^n$ from $m = O(s \log(n/s))$ random measurements
RIP: the measurement matrix approximately preserves norms of all sparse vectors
If $A$ satisfies RIP, basis pursuit (L1 minimization) recovers $x$ exactly
Random Gaussian matrices satisfy RIP with high probability
Recovery is stable under noise: $\|\hat{x} - x\|_2 \leq C \epsilon$
L1 promotes sparsity because the L1 ball has corners at coordinate axes

Exercises

ExerciseCore

Problem

A signal $x \in \mathbb{R}^{10000}$ is 50-sparse. Approximately how many Gaussian random measurements do you need for recovery via basis pursuit? Express your answer in terms of the theoretical bound $m = O(s \log(n/s))$ .

ExerciseAdvanced

Problem

Explain why L2 minimization ( $\min \|z\|_2$ subject to $Az = y$ ) does not recover sparse signals, even when L1 minimization does. Give a geometric argument.

ExerciseResearch

Problem

Prove that $m \geq 2s$ measurements are necessary for exact recovery of all $s$ -sparse signals in $\mathbb{R}^n$ , regardless of the measurement matrix $A$ and the recovery algorithm. This shows the $O(s \log(n/s))$ upper bound is tight up to the logarithmic factor.

References

Canonical:

Candes, Romberg, Tao, "Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information" (2006)
Donoho, "Compressed sensing" (IEEE Trans. Info. Theory, 2006)
Candes & Tao, "Decoding by linear programming" (2005)

Current:

Foucart & Rauhut, A Mathematical Introduction to Compressive Sensing (2013)
Eldar & Kutyniok, Compressed Sensing: Theory and Applications (2012)

Next Topics

The natural next step from compressed sensing:

Minimax lower bounds: information-theoretic limits on sparse estimation rates

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Lasso Regressionlayer 2 · tier 1
Sub-Gaussian Random Variableslayer 2 · tier 1

Derived topics

1

Symbolic Regression and Equation Discoverylayer 4 · tier 3

Graph-backed continuations

Symbolic Regression and Equation Discovery