Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Eigenvalues and Eigenvectors

Eigenvalues and eigenvectors: the directions a matrix scales without rotating. Characteristic polynomial, diagonalization, the spectral theorem for symmetric matrices, and the direct connection to PCA.

CoreTier 1Stable~55 min

Why This Matters

Before: xv₁v₂AAfter: Ax2.5v₁0.8v₂Av₁ = 2.5v₁ (stretched)    Av₂ = 0.8v₂ (compressed)Eigenvectors define the axes. Eigenvalues define the stretch factors.

Eigenvalues and eigenvectors are the most fundamental concept in applied linear algebra. When you run PCA, you are computing eigenvectors of a covariance matrix. When you analyze the convergence rate of gradient descent, you are looking at eigenvalues of the Hessian. When you study Markov chains, the mixing time depends on the second-largest eigenvalue. When you compute the condition number of a matrix, you are taking a ratio of eigenvalues.

If you do not understand eigenvalues and eigenvectors, you cannot understand PCA, SVD, spectral clustering, or any convergence analysis that involves matrices. This is foundational linear algebra, not decoration.

Mental Model

Multiply a vector xx by a matrix AA, and in general AxAx points in a different direction than xx. But for special vectors, the eigenvectors, multiplication by AA only stretches or shrinks them, without changing their direction. The amount of stretching is the eigenvalue.

Think of AA as a transformation. Most vectors get rotated and stretched. Eigenvectors are the "natural axes" of the transformation: the directions along which AA acts by pure scaling. Finding these axes reveals the intrinsic geometry of whatever AA represents.

Formal Setup

Definition

Eigenvector and Eigenvalue

Let ARn×nA \in \mathbb{R}^{n \times n} be a square matrix. A nonzero vector vRnv \in \mathbb{R}^n is an eigenvector of AA with eigenvalue λC\lambda \in \mathbb{C} if:

Av=λvAv = \lambda v

The eigenvalue λ\lambda tells you the scaling factor. If λ>0\lambda > 0, the eigenvector is stretched. If λ<0\lambda < 0, it is flipped and stretched. If λ<1|\lambda| < 1, it is shrunk. If λ=0\lambda = 0, the eigenvector is sent to the zero vector (and AA is singular).

Definition

Characteristic Polynomial

The eigenvalues of AA are the roots of the characteristic polynomial:

p(λ)=det(AλI)=0p(\lambda) = \det(A - \lambda I) = 0

This is a degree-nn polynomial in λ\lambda, so AA has exactly nn eigenvalues (counted with multiplicity) in C\mathbb{C}. For real matrices, complex eigenvalues come in conjugate pairs λ,λˉ\lambda, \bar{\lambda}.

Definition

Eigenspace

The eigenspace for eigenvalue λ\lambda is the set of all vectors vv satisfying (AλI)v=0(A - \lambda I)v = 0, which is the null space of AλIA - \lambda I. Its dimension is the geometric multiplicity of λ\lambda. The algebraic multiplicity is the multiplicity of λ\lambda as a root of the characteristic polynomial. The geometric multiplicity is always at most the algebraic multiplicity.

Trace, Determinant, and Eigenvalues

Two fundamental invariants of a matrix are directly related to its eigenvalues:

Trace: The trace of AA equals the sum of its eigenvalues:

tr(A)=i=1nλi\text{tr}(A) = \sum_{i=1}^n \lambda_i

Determinant: The determinant of AA equals the product of its eigenvalues:

det(A)=i=1nλi\det(A) = \prod_{i=1}^n \lambda_i

These are immediate consequences of the fact that the characteristic polynomial det(AλI)\det(A - \lambda I) has roots λ1,,λn\lambda_1, \ldots, \lambda_n. Expanding the polynomial and comparing coefficients of λn1\lambda^{n-1} gives the trace relation; evaluating at λ=0\lambda = 0 gives the determinant relation.

Consequence: AA is singular if and only if at least one eigenvalue is zero.

Diagonalization

Definition

Diagonalization

If AA has nn linearly independent eigenvectors v1,,vnv_1, \ldots, v_n with eigenvalues λ1,,λn\lambda_1, \ldots, \lambda_n, then AA is diagonalizable:

A=PDP1A = PDP^{-1}

where P=[v1v2vn]P = [v_1 | v_2 | \cdots | v_n] is the matrix whose columns are eigenvectors, and D=diag(λ1,,λn)D = \text{diag}(\lambda_1, \ldots, \lambda_n) is the diagonal matrix of eigenvalues.

Why this matters: Diagonalization makes matrix powers trivial: Ak=PDkP1A^k = PD^kP^{-1}, where Dk=diag(λ1k,,λnk)D^k = \text{diag}(\lambda_1^k, \ldots, \lambda_n^k). This is why eigenvalues control the long-term behavior of dynamical systems and iterative algorithms.

Not every matrix is diagonalizable. A matrix fails to be diagonalizable when some eigenvalue has geometric multiplicity strictly less than its algebraic multiplicity (defective eigenvalues). However, symmetric matrices are always diagonalizable --- and in the best possible way.

Main Theorems

Theorem

Spectral Theorem for Symmetric Matrices

Statement

If ARn×nA \in \mathbb{R}^{n \times n} is symmetric (A=AA = A^\top), then:

  1. All eigenvalues of AA are real.
  2. Eigenvectors corresponding to distinct eigenvalues are orthogonal.
  3. AA has an orthogonal diagonalization: A=QΛQA = Q\Lambda Q^\top

where QQ is an orthogonal matrix (QQ=IQ^\top Q = I) whose columns are orthonormal eigenvectors, and Λ=diag(λ1,,λn)\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n) contains the real eigenvalues.

Intuition

A symmetric matrix acts as a pure scaling along orthogonal axes, with no rotation component. The eigenvectors form an orthonormal basis for Rn\mathbb{R}^n, and in this basis, AA is just a diagonal matrix. This is the simplest possible structure a matrix can have.

Geometrically: the level sets {x:xAx=c}\{x : x^\top A x = c\} of a symmetric matrix are ellipsoids whose axes are the eigenvectors and whose semi-axis lengths are determined by the eigenvalues.

Proof Sketch

Real eigenvalues: Let Av=λvAv = \lambda v with v0v \neq 0. Then vˉAv=λvˉv=λv2\bar{v}^\top Av = \lambda \bar{v}^\top v = \lambda \|v\|^2. Since AA is real and symmetric, vˉAv=vAvˉ=vAvˉ=vˉAv\bar{v}^\top Av = \overline{v^\top A\bar{v}} = \overline{v^\top A^\top \bar{v}} = \overline{\bar{v}^\top Av}. So λv2=λˉv2\lambda \|v\|^2 = \bar{\lambda}\|v\|^2, giving λ=λˉ\lambda = \bar{\lambda}, hence λR\lambda \in \mathbb{R}.

Orthogonality: If Av=λvAv = \lambda v and Aw=μwAw = \mu w with λμ\lambda \neq \mu, then λvw=(Av)w=vAw=vAw=μvw\lambda v^\top w = (Av)^\top w = v^\top A^\top w = v^\top Aw = \mu v^\top w. So (λμ)vw=0(\lambda - \mu)v^\top w = 0, and since λμ\lambda \neq \mu, vw=0v^\top w = 0.

Existence of full orthonormal eigenbasis: By induction on nn. The real eigenvalue guaranteed by the fundamental theorem of algebra (applied to the characteristic polynomial, which has real coefficients and hence at least one real root for odd degree, or using the compactness argument for the Rayleigh quotient) gives an eigenvector v1v_1. The orthogonal complement v1v_1^\perp is invariant under AA (by symmetry), so we can apply the argument inductively.

Why It Matters

The spectral theorem is why PCA works. The covariance matrix Σ=E[(Xμ)(Xμ)]\Sigma = \mathbb{E}[(X - \mu)(X - \mu)^\top] is symmetric and positive semidefinite. Its eigenvectors are the principal components --- the directions of maximum variance. Its eigenvalues are the variances along those directions. PCA is literally the spectral theorem applied to the sample covariance matrix.

Beyond PCA: the spectral theorem governs the convergence rate of gradient descent (eigenvalues of the Hessian), the mixing time of reversible Markov chains (eigenvalues of the transition matrix), and the behavior of graph Laplacians in spectral clustering. The spectral theory of operators generalizes these ideas to infinite-dimensional spaces.

Failure Mode

The spectral theorem requires symmetry. For non-symmetric matrices, eigenvalues can be complex, eigenvectors need not be orthogonal, and the matrix may not even be diagonalizable. If you are working with a non-symmetric matrix (e.g., a non-reversible Markov chain transition matrix), you need the Jordan normal form or the singular value decomposition instead.

Connection to PCA

The link between eigenvalues and PCA is direct:

Given data points x1,,xnRdx_1, \ldots, x_n \in \mathbb{R}^d with sample covariance matrix Σ^=1ni=1n(xixˉ)(xixˉ)\hat{\Sigma} = \frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})(x_i - \bar{x})^\top:

  1. Σ^\hat{\Sigma} is symmetric and positive semidefinite
  2. By the spectral theorem: Σ^=QΛQ\hat{\Sigma} = Q\Lambda Q^\top
  3. The eigenvector with the largest eigenvalue is the direction of maximum variance (the first principal component)
  4. The kk-th eigenvector (sorted by decreasing eigenvalue) is the kk-th principal component
  5. The eigenvalue λk\lambda_k equals the variance of the data projected onto the kk-th principal component

This is not a loose analogy. PCA is eigendecomposition of the covariance matrix. Understanding eigenvalues is understanding PCA.

Canonical Examples

Example

Eigenvalues of a 2x2 matrix

Let A=(4113)A = \begin{pmatrix} 4 & 1 \\ 1 & 3 \end{pmatrix}.

The characteristic polynomial is: det(AλI)=(4λ)(3λ)1=λ27λ+11=0\det(A - \lambda I) = (4 - \lambda)(3 - \lambda) - 1 = \lambda^2 - 7\lambda + 11 = 0

Solving: λ=7±49442=7±52\lambda = \frac{7 \pm \sqrt{49 - 44}}{2} = \frac{7 \pm \sqrt{5}}{2}.

So λ1=7+524.618\lambda_1 = \frac{7 + \sqrt{5}}{2} \approx 4.618 and λ2=7522.382\lambda_2 = \frac{7 - \sqrt{5}}{2} \approx 2.382.

Check: tr(A)=4+3=7=λ1+λ2\text{tr}(A) = 4 + 3 = 7 = \lambda_1 + \lambda_2. And det(A)=121=11=λ1λ2\det(A) = 12 - 1 = 11 = \lambda_1 \cdot \lambda_2. Both eigenvalues are positive, so AA is positive definite. Since AA is symmetric, the eigenvectors are orthogonal.

Example

Why eigenvalues control iteration convergence

Consider the iteration xk+1=Axkx_{k+1} = Ax_k. If AA is diagonalizable with A=PDP1A = PDP^{-1}, then xk=Akx0=PDkP1x0x_k = A^k x_0 = PD^kP^{-1}x_0. Writing c=P1x0c = P^{-1}x_0:

xk=i=1nciλikvix_k = \sum_{i=1}^n c_i \lambda_i^k v_i

The term with the largest λi|\lambda_i| dominates. If λi<1|\lambda_i| < 1 for all ii, the iteration converges to zero. If any λi>1|\lambda_i| > 1, it diverges. The spectral radius ρ(A)=maxiλi\rho(A) = \max_i |\lambda_i| determines the asymptotic behavior.

This is why eigenvalues appear in every convergence analysis: gradient descent, power iteration, PageRank, Markov chain mixing.

Common Confusions

Watch Out

Eigenvalues can be complex even for real matrices

The matrix A=(0110)A = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix} (a 90-degree rotation) has characteristic polynomial λ2+1=0\lambda^2 + 1 = 0, giving eigenvalues λ=±i\lambda = \pm i. There are no real eigenvectors because a rotation has no direction that it merely scales. Complex eigenvalues indicate a rotational component. For symmetric matrices, this never happens --- all eigenvalues are real.

Watch Out

Eigenvectors are only unique up to scaling

If vv is an eigenvector of AA, so is αv\alpha v for any nonzero scalar α\alpha. When people say "the eigenvector," they usually mean a unit eigenvector (v=1\|v\| = 1), but even then the sign is ambiguous (vv and v-v are both unit eigenvectors). This sign ambiguity appears in PCA: principal components are defined up to a sign flip.

Watch Out

Symmetric positive semidefinite is not the same as symmetric positive definite

A symmetric matrix is positive semidefinite (A0A \succeq 0) if all eigenvalues are 0\geq 0. It is positive definite (A0A \succ 0) if all eigenvalues are >0> 0. The covariance matrix is always positive semidefinite. It is positive definite if and only if the data spans all dd dimensions (no perfect collinearity). The distinction matters: positive semidefiniteness means AA might have a nontrivial null space.

Summary

  • Eigenvectors are directions that a matrix scales without rotating; eigenvalues are the scaling factors
  • Characteristic polynomial det(AλI)=0\det(A - \lambda I) = 0 gives eigenvalues
  • Trace = sum of eigenvalues; determinant = product of eigenvalues
  • Diagonalization A=PDP1A = PDP^{-1} makes matrix powers trivial
  • Spectral theorem: symmetric matrices have real eigenvalues and orthogonal eigenvectors (A=QΛQA = Q\Lambda Q^\top)
  • PCA = eigendecomposition of the covariance matrix
  • Eigenvalues control convergence rates of iterative algorithms

Exercises

ExerciseCore

Problem

Find the eigenvalues and eigenvectors of A=(3226)A = \begin{pmatrix} 3 & 2 \\ 2 & 6 \end{pmatrix}. Verify that the eigenvectors are orthogonal and that the trace and determinant relations hold.

ExerciseAdvanced

Problem

Let AA be a real symmetric n×nn \times n matrix with eigenvalues λ1λ2λn\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n. Prove that:

λ1=maxx=1xAx\lambda_1 = \max_{\|x\| = 1} x^\top A x

This is the Rayleigh quotient characterization. Why does this directly imply that the first principal component is the leading eigenvector of the covariance matrix?

References

Canonical:

  • Strang, Linear Algebra and Its Applications (4th ed., 2006), Chapters 5-6
  • Horn & Johnson, Matrix Analysis (2nd ed., 2012), Chapters 1-4
  • Halmos, Finite-Dimensional Vector Spaces (1958), Chapters 6-7 (spectral theory)

Current:

  • Boyd & Vandenberghe, Introduction to Applied Linear Algebra (2018), Chapter 10
  • Axler, Linear Algebra Done Right (4th ed., 2024), Chapters 5, 7
  • Trefethen & Bau, Numerical Linear Algebra (1997), Lectures 24-27 (eigenvalue algorithms and perturbation theory)

Next Topics

Building on eigenvalues and eigenvectors:

  • Singular value decomposition: the generalization to non-square and non-symmetric matrices
  • Principal component analysis: eigenvalues of the covariance matrix in action
  • Conditioning and condition number: the ratio of extreme eigenvalues

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Builds on This

Next Topics