Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Mathematical Infrastructure

Spectral Theory of Operators

Spectral theorem for compact self-adjoint operators on Hilbert spaces: every such operator has a countable orthonormal eigenbasis with real eigenvalues accumulating only at zero. This is the infinite-dimensional backbone of PCA, kernel methods, and neural tangent kernel theory.

AdvancedTier 3Stable~65 min
0

Why This Matters

In finite dimensions, the spectral theorem for symmetric matrices gives you PCA: diagonalize the covariance matrix and read off the principal components via eigenvalues and eigenvectors. But many ML methods live in infinite-dimensional spaces. Kernel methods operate in a reproducing kernel Hilbert space. Gaussian processes define covariance operators on function spaces. The neural tangent kernel is an infinite-width limit operator. To rigorously analyze any of these, you need the spectral theorem in infinite dimensions.

The result is remarkably clean: compact self-adjoint operators on Hilbert spaces behave almost exactly like symmetric matrices, with a countable orthonormal eigenbasis and real eigenvalues.

Mental Model

A compact self-adjoint operator TT on a Hilbert space is the infinite-dimensional analog of a real symmetric matrix. Think of TT as stretching space along orthogonal directions (eigenfunctions) by amounts (eigenvalues) that decay to zero. The eigenvalues form a sequence converging to zero, and the eigenfunctions form a complete orthonormal basis for the range of TT.

The key difference from finite dimensions: the spectrum can have a continuous part in general, but compactness forces it to be discrete (except possibly at zero).

Formal Setup and Notation

Let H\mathcal{H} be a separable Hilbert space with inner product ,\langle \cdot, \cdot \rangle. An operator T:HHT: \mathcal{H} \to \mathcal{H} is self-adjoint if Tx,y=x,Ty\langle Tx, y \rangle = \langle x, Ty \rangle for all x,yHx, y \in \mathcal{H}. It is compact if it maps bounded sets to precompact sets (sets with compact closure).

Definition

Compact Operator

An operator T:HHT: \mathcal{H} \to \mathcal{H} is compact if for every bounded sequence {xn}\{x_n\}, the sequence {Txn}\{Tx_n\} has a convergent subsequence. Equivalently, TT is the norm limit of finite-rank operators. In ML: integral operators defined by continuous kernels on compact domains are compact.

Definition

Rayleigh Quotient

The Rayleigh quotient of a self-adjoint operator TT at x0x \neq 0 is:

R(x)=Tx,xx,xR(x) = \frac{\langle Tx, x \rangle}{\langle x, x \rangle}

The supremum of R(x)R(x) over all nonzero xx equals the largest eigenvalue (or the supremum of the spectrum). This is the infinite-dimensional analog of the variational characterization of eigenvalues.

Main Theorems

Theorem

Spectral Theorem for Compact Self-Adjoint Operators

Statement

Let T:HHT: \mathcal{H} \to \mathcal{H} be a compact self-adjoint operator on a separable Hilbert space. Then there exists a countable (finite or infinite) orthonormal sequence {en}\{e_n\} of eigenvectors with corresponding real eigenvalues {λn}\{\lambda_n\} such that:

Tx=nλnx,enenfor all xHTx = \sum_{n} \lambda_n \langle x, e_n \rangle e_n \quad \text{for all } x \in \mathcal{H}

The eigenvalues satisfy λ1λ2|\lambda_1| \geq |\lambda_2| \geq \cdots and λn0\lambda_n \to 0 if the sequence is infinite. The eigenvectors {en}\{e_n\} form an orthonormal basis for range(T)\overline{\text{range}(T)}.

Intuition

A compact self-adjoint operator is a "matrix" with countably many rows and columns, all orthogonal, with entries (eigenvalues) that decay to zero. You can truncate the sum after kk terms to get a rank-kk approximation, just like truncated SVD in finite dimensions.

Proof Sketch

Step 1: Show T=supx=1Tx,x\|T\| = \sup_{\|x\|=1} |\langle Tx, x \rangle|, so either λ1=T\lambda_1 = \|T\| or λ1=T\lambda_1 = -\|T\| is achieved (by compactness of the unit ball image). Step 2: The maximizer e1e_1 is an eigenvector with eigenvalue λ1\lambda_1. Step 3: TT maps {e1}\{e_1\}^\perp to itself (by self-adjointness), and the restriction is again compact and self-adjoint. Step 4: Induct to extract e2,e3,e_2, e_3, \ldots. Step 5: The remaining eigenvalues converge to zero by compactness.

Why It Matters

This theorem is the foundation of kernel PCA, Mercer's theorem, and the analysis of integral operators. When you compute the eigendecomposition of a kernel matrix, you are approximating this spectral decomposition. The decay rate of the eigenvalues determines the effective dimensionality of the RKHS and the statistical difficulty of kernel learning.

Failure Mode

Without compactness, the spectrum can be continuous and there may be no eigenvectors at all. The identity operator on an infinite-dimensional space has spectrum {1}\{1\} but no countable eigenbasis (every nonzero vector is an eigenvector). Without self-adjointness, eigenvalues can be complex and eigenvectors need not be orthogonal.

Theorem

Min-Max Characterization of Eigenvalues (Courant-Fischer)

Statement

The kk-th eigenvalue of a compact self-adjoint operator TT (in decreasing order) satisfies:

λk=maxdim(V)=kminxV,x=1Tx,x=mindim(V)=k1maxxV,x=1Tx,x\lambda_k = \max_{\dim(V) = k} \min_{x \in V, \|x\| = 1} \langle Tx, x \rangle = \min_{\dim(V) = k-1} \max_{x \perp V, \|x\| = 1} \langle Tx, x \rangle

Intuition

The kk-th eigenvalue is the best worst-case Rayleigh quotient over all kk-dimensional subspaces. This gives a variational characterization that does not require knowing the previous eigenvectors. It directly implies that eigenvalues are stable under perturbations (Weyl's inequality).

Proof Sketch

For the max-min form: any kk-dimensional subspace VV must intersect span{ek,ek+1,}\text{span}\{e_k, e_{k+1}, \ldots\} nontrivially (by dimension counting). On this intersection, Tx,xλk\langle Tx, x \rangle \leq \lambda_k. Choosing V=span{e1,,ek}V = \text{span}\{e_1, \ldots, e_k\} achieves equality.

Why It Matters

The min-max principle is the theoretical foundation of PCA optimality. It proves that the top-kk eigenvectors give the best rank-kk subspace for capturing variance. It also gives eigenvalue perturbation bounds (Weyl's inequality) that control how eigenvalues change when the operator is perturbed.

Failure Mode

The min-max principle applies to self-adjoint operators. For non-self-adjoint operators (e.g., non-symmetric kernel matrices), singular values replace eigenvalues and the characterization involves both left and right subspaces.

The spectral theory here underpins the concentration inequalities used in kernel learning theory, where eigenvalue decay controls the effective complexity of the hypothesis class.

Connection to ML

Definition

Mercer's Theorem (Connection)

If k:X×XRk: \mathcal{X} \times \mathcal{X} \to \mathbb{R} is a continuous positive definite kernel on a compact domain, the integral operator (Tkf)(x)=k(x,x)f(x)dx(T_k f)(x) = \int k(x, x') f(x') dx' is compact and self-adjoint. The spectral theorem gives k(x,x)=nλnen(x)en(x)k(x, x') = \sum_n \lambda_n e_n(x) e_n(x'). This is Mercer's theorem. The eigenfunctions ene_n are the feature map, and λn\lambda_n controls the importance of each feature.

Canonical Examples

Example

Spectral decomposition of a Gaussian kernel operator

Consider the Gaussian kernel k(x,x)=exp(xx2/(2σ2))k(x, x') = \exp(-\|x - x'\|^2 / (2\sigma^2)) on [0,1][0,1]. The integral operator has eigenvalues that decay exponentially fast: λnCexp(cn)\lambda_n \sim C \exp(-c n). This rapid decay means the RKHS is effectively finite-dimensional, explaining why Gaussian kernel methods work well even with moderate sample sizes. The eigenfunctions resemble Hermite functions.

Example

NTK spectral analysis

The neural tangent kernel Θ(x,x)\Theta(x, x') at infinite width defines a compact self-adjoint operator on L2(data distribution)L^2(\text{data distribution}). The eigenvalue decay rate of this NTK operator determines the learning speed of gradient descent: modes with large eigenvalues are learned first, and modes with small eigenvalues are learned slowly or not at all. This is the spectral bias of neural networks.

Common Confusions

Watch Out

Compact does not mean bounded

Every compact operator is bounded, but not conversely. The identity operator on an infinite-dimensional Hilbert space is bounded but not compact. The compactness condition is strictly stronger and is what forces the eigenvalues to form a discrete sequence converging to zero.

Watch Out

Zero can be in the spectrum without being an eigenvalue

For compact operators, zero may or may not be an eigenvalue, but it is always in the spectrum when the Hilbert space is infinite-dimensional. The null space of TT (eigenvectors for eigenvalue 0) can be trivial, finite, or infinite-dimensional.

Summary

  • Compact self-adjoint operators have countable eigenvalues converging to zero
  • Eigenvectors form an orthonormal basis for the closure of the range
  • Rayleigh quotient variational principle gives eigenvalues without knowing eigenvectors
  • Courant-Fischer min-max theorem implies PCA optimality and perturbation bounds
  • Mercer's theorem connects kernel functions to spectral decompositions
  • Eigenvalue decay rate controls effective dimensionality and learning difficulty

Exercises

ExerciseCore

Problem

Let TT be a compact self-adjoint operator with eigenvalues λ1λ20\lambda_1 \geq \lambda_2 \geq \cdots \geq 0. Show that the best rank-kk approximation to TT in operator norm is Tk=i=1kλi,eieiT_k = \sum_{i=1}^k \lambda_i \langle \cdot, e_i \rangle e_i and the approximation error is TTk=λk+1\|T - T_k\| = \lambda_{k+1}.

ExerciseAdvanced

Problem

Suppose a kernel kk on [0,1][0,1] has eigenvalues λn=n2s\lambda_n = n^{-2s} for s>1/2s > 1/2. What is the trace of the integral operator TkT_k? What does this represent in terms of the kernel?

References

Canonical:

  • Reed & Simon, Methods of Modern Mathematical Physics I: Functional Analysis (1980), Chapter VI
  • Lax, Functional Analysis (2002), Chapters 28-31

Current:

  • Steinwart & Christmann, Support Vector Machines (2008), Chapter 4 (Mercer's theorem for ML)
  • Wainwright, High-Dimensional Statistics (2019), Chapter 12 (kernel operators)

Next Topics

Natural extensions from spectral theory:

  • Kernels and RKHS: reproducing kernel Hilbert spaces built on Mercer's theorem
  • Neural tangent kernel: spectral analysis of infinite-width neural networks
  • Principal component analysis: finite-sample spectral methods

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.