Fano vs. Le Cam Method. Minimax Lower Bounds via Multiple or Binary Testing

What Each Measures

Both Fano's method and Le Cam's method prove minimax lower bounds: they show that no estimator can achieve risk below a certain threshold over a given parameter class. They do this by reducing the estimation problem to a hypothesis testing problem, then bounding the testing error.

Fano's method constructs many hypotheses and bounds the probability of correctly identifying the true one using mutual information.

Le Cam's method constructs exactly two hypotheses (or two mixtures) and bounds the probability of distinguishing them using total variation distance.

Side-by-Side Statement

Definition

Fano's Method

Choose $M$ parameter values $\theta_1, \ldots, \theta_M$ in the parameter space such that:

They are well-separated: $d(\theta_i, \theta_j) \geq 2s$ for all $i \neq j$
The mutual information between the index $V$ (uniform on $\{1, \ldots, M\}$ ) and the data $X$ is bounded: $I(V; X) \leq \beta$

Then by Fano's inequality:

$\inf_{\hat{\theta}} \max_{i} \Pr[d(\hat{\theta}, \theta_i) \geq s] \geq 1 - \frac{\beta + \log 2}{\log M}$

This gives a minimax lower bound of $s$ whenever $\beta \ll \log M$ .

Definition

Le Cam's Two-Point Method

Choose two parameter values $\theta_0, \theta_1$ with $d(\theta_0, \theta_1) \geq 2s$ . Let $P_0, P_1$ be the corresponding data distributions. Then:

$\inf_{\hat{\theta}} \max_{i \in \{0,1\}} \Pr[d(\hat{\theta}, \theta_i) \geq s] \geq \frac{1}{2}(1 - \mathrm{TV}(P_0, P_1))$

where $\mathrm{TV}$ is total variation distance. If $P_0$ and $P_1$ are hard to distinguish ( $\mathrm{TV}$ is small), no estimator can reliably determine which parameter generated the data.

Where Each Is Stronger

Le Cam wins on simplicity

Le Cam's method requires choosing just two hypotheses and computing a single total variation distance. The proof is short: any estimator that is good at estimation must also be good at testing between $P_0$ and $P_1$ , and the best test has error probability at least $(1 - \mathrm{TV}(P_0, P_1))/2$ .

For many problems, a two-point argument is sufficient to get the correct minimax rate. For example, the $n^{-1}$ rate for estimating a bounded mean follows immediately from Le Cam with two point masses separated by $O(1/\sqrt{n})$ .

Fano wins when many hypotheses are needed

Some problems require packing many well-separated hypotheses to get tight lower bounds. This happens when the parameter space is high-dimensional. For nonparametric density estimation over a Sobolev ball in $d$ dimensions, you need $M = \exp(c \cdot n^{d/(2s+d)})$ hypotheses to get the optimal rate $n^{-2s/(2s+d)}$ . Le Cam's two-point method gives a suboptimal rate in such settings.

The key quantity is the metric entropy: how many well-separated points can you pack into the parameter space? When this number is large, Fano extracts a tighter bound because the mutual information constraint involves $\log M$ .

Where Each Fails

Le Cam fails in high-dimensional problems

With only two hypotheses, Le Cam cannot capture the geometric complexity of high-dimensional parameter spaces. The two-point lower bound is often loose by polynomial factors in the dimension. You can partially fix this using Le Cam's mixture method (comparing mixtures of distributions rather than individual ones), but Fano is usually more natural for high-dimensional problems.

Fano fails when constructing many hypotheses is hard

Fano requires a large packing set of well-separated parameters whose corresponding distributions are mutually close in KL divergence. For some problems, constructing such a packing set is difficult or impossible. The Gilbert-Varshamov lemma helps for problems with Hamming-type structure, but not all problems have this structure.

Both require creativity in hypothesis construction

Neither method is automatic. The art of proving lower bounds lies in choosing the right set of hypotheses. A poor choice gives a loose bound; a good choice gives a tight one. The method itself just converts the hypothesis construction into a bound.

Key Assumptions That Differ

	Fano	Le Cam
Number of hypotheses	$M \geq 2$ (often exponentially many)	Exactly 2 (or 2 mixtures)
Distance measure	KL divergence / mutual information	Total variation distance
Key quantity	$I(V; X) / \log M$	$\mathrm{TV}(P_0, P_1)$
Tightest when	Parameter space has large metric entropy	Two well-chosen hypotheses suffice
Standard tools	KL chain rule, Gilbert-Varshamov lemma	Pinsker inequality, data processing

The Connection: Both Reduce Estimation to Testing

Proposition

Testing Reduction Framework

Statement

Both methods share the same meta-argument:

Choose a finite set of parameter values that are well-separated under the loss function.
Argue that any good estimator must be able to identify the true parameter from this set.
Show that identification is hard because the data distributions are too similar.

Fano measures similarity via mutual information across all $M$ hypotheses. Le Cam measures similarity via total variation between two distributions. Both yield: minimax risk $\geq$ (separation) $\times$ (probability of testing error).

Intuition

The difference is quantitative, not qualitative. Le Cam asks: can you tell apart two distributions? Fano asks: can you identify one distribution among many? When there are many plausible hypotheses and all look similar, Fano gives a stronger lower bound because the identification task is harder than binary testing.

report a correction →

What to Memorize

Le Cam: Two hypotheses, total variation. Lower bound is $(1 - \mathrm{TV})/2$ .
Fano: $M$ hypotheses, mutual information. Lower bound requires $I(V;X) \ll \log M$ .
When to use Le Cam: The problem has a natural two-point structure, or you want a quick lower bound that may not be tight.
When to use Fano: The parameter space is high-dimensional and you need to exploit its geometric complexity via packing arguments.
Both are lower bounds: They tell you what is impossible, not what is achievable. Matching upper bounds require separate analysis.

When a Researcher Would Use Each

Example

Mean estimation rate

To show that estimating the mean of a distribution on $[0,1]$ from $n$ i.i.d. samples requires error at least $\Omega(1/\sqrt{n})$ , use Le Cam. Choose $\theta_0 = 1/2 - \delta$ and $\theta_1 = 1/2 + \delta$ with $\delta = c/\sqrt{n}$ . The TV distance between $n$ Bernoulli samples is bounded by $O(n\delta^2) = O(1)$ , giving the lower bound.

Example

Nonparametric density estimation

To prove the minimax rate $n^{-2s/(2s+d)}$ for estimating a density in a $d$ -dimensional Sobolev ball of smoothness $s$ , use Fano. Pack $M = \exp(c \cdot n^{d/(2s+d)})$ bump functions into the Sobolev ball, verify the KL divergences are small via tensorization, and apply Fano to get the rate.

Common Confusions

Watch Out

Fano and Le Cam do not give the same bound

For the same problem, Fano with $M = 2$ does not reduce to Le Cam. Fano uses KL divergence while Le Cam uses total variation. They are related by Pinsker's inequality ( $\mathrm{TV}^2 \leq \mathrm{KL}/2$ ), but this conversion is lossy. For two hypotheses, Le Cam is generally tighter because it works directly with TV.

Watch Out

Mutual information is not the same as pairwise KL

A common error in applying Fano's method is to bound $I(V; X)$ by the maximum pairwise KL divergence. The correct bound is:

$I(V; X) \leq \frac{1}{M} \sum_{j=1}^M \mathrm{KL}(P_j \| \bar{P})$

where $\bar{P} = \frac{1}{M}\sum_j P_j$ . This is bounded above by $\frac{1}{M^2}\sum_{i,j} \mathrm{KL}(P_i \| P_j)$ , which is the average pairwise KL.