Foundations
Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors: the directions a matrix scales without rotating. Characteristic polynomial, diagonalization, the spectral theorem for symmetric matrices, and the direct connection to PCA.
Prerequisites
Why This Matters
Eigenvalues and eigenvectors are the most fundamental concept in applied linear algebra. When you run PCA, you are computing eigenvectors of a covariance matrix. When you analyze the convergence rate of gradient descent, you are looking at eigenvalues of the Hessian. When you study Markov chains, the mixing time depends on the second-largest eigenvalue. When you compute the condition number of a matrix, you are taking a ratio of eigenvalues.
If you do not understand eigenvalues and eigenvectors, you cannot understand PCA, SVD, spectral clustering, or any convergence analysis that involves matrices. This is foundational linear algebra, not decoration.
Mental Model
Multiply a vector by a matrix , and in general points in a different direction than . But for special vectors, the eigenvectors, multiplication by only stretches or shrinks them, without changing their direction. The amount of stretching is the eigenvalue.
Think of as a transformation. Most vectors get rotated and stretched. Eigenvectors are the "natural axes" of the transformation: the directions along which acts by pure scaling. Finding these axes reveals the intrinsic geometry of whatever represents.
Formal Setup
Eigenvector and Eigenvalue
Let be a square matrix. A nonzero vector is an eigenvector of with eigenvalue if:
The eigenvalue tells you the scaling factor. If , the eigenvector is stretched. If , it is flipped and stretched. If , it is shrunk. If , the eigenvector is sent to the zero vector (and is singular).
Characteristic Polynomial
The eigenvalues of are the roots of the characteristic polynomial:
This is a degree- polynomial in , so has exactly eigenvalues (counted with multiplicity) in . For real matrices, complex eigenvalues come in conjugate pairs .
Eigenspace
The eigenspace for eigenvalue is the set of all vectors satisfying , which is the null space of . Its dimension is the geometric multiplicity of . The algebraic multiplicity is the multiplicity of as a root of the characteristic polynomial. The geometric multiplicity is always at most the algebraic multiplicity.
Trace, Determinant, and Eigenvalues
Two fundamental invariants of a matrix are directly related to its eigenvalues:
Trace: The trace of equals the sum of its eigenvalues:
Determinant: The determinant of equals the product of its eigenvalues:
These are immediate consequences of the fact that the characteristic polynomial has roots . Expanding the polynomial and comparing coefficients of gives the trace relation; evaluating at gives the determinant relation.
Consequence: is singular if and only if at least one eigenvalue is zero.
Diagonalization
Diagonalization
If has linearly independent eigenvectors with eigenvalues , then is diagonalizable:
where is the matrix whose columns are eigenvectors, and is the diagonal matrix of eigenvalues.
Why this matters: Diagonalization makes matrix powers trivial: , where . This is why eigenvalues control the long-term behavior of dynamical systems and iterative algorithms.
Not every matrix is diagonalizable. A matrix fails to be diagonalizable when some eigenvalue has geometric multiplicity strictly less than its algebraic multiplicity (defective eigenvalues). However, symmetric matrices are always diagonalizable --- and in the best possible way.
Main Theorems
Spectral Theorem for Symmetric Matrices
Statement
If is symmetric (), then:
- All eigenvalues of are real.
- Eigenvectors corresponding to distinct eigenvalues are orthogonal.
- has an orthogonal diagonalization:
where is an orthogonal matrix () whose columns are orthonormal eigenvectors, and contains the real eigenvalues.
Intuition
A symmetric matrix acts as a pure scaling along orthogonal axes, with no rotation component. The eigenvectors form an orthonormal basis for , and in this basis, is just a diagonal matrix. This is the simplest possible structure a matrix can have.
Geometrically: the level sets of a symmetric matrix are ellipsoids whose axes are the eigenvectors and whose semi-axis lengths are determined by the eigenvalues.
Proof Sketch
Real eigenvalues: Let with . Then . Since is real and symmetric, . So , giving , hence .
Orthogonality: If and with , then . So , and since , .
Existence of full orthonormal eigenbasis: By induction on . The real eigenvalue guaranteed by the fundamental theorem of algebra (applied to the characteristic polynomial, which has real coefficients and hence at least one real root for odd degree, or using the compactness argument for the Rayleigh quotient) gives an eigenvector . The orthogonal complement is invariant under (by symmetry), so we can apply the argument inductively.
Why It Matters
The spectral theorem is why PCA works. The covariance matrix is symmetric and positive semidefinite. Its eigenvectors are the principal components --- the directions of maximum variance. Its eigenvalues are the variances along those directions. PCA is literally the spectral theorem applied to the sample covariance matrix.
Beyond PCA: the spectral theorem governs the convergence rate of gradient descent (eigenvalues of the Hessian), the mixing time of reversible Markov chains (eigenvalues of the transition matrix), and the behavior of graph Laplacians in spectral clustering. The spectral theory of operators generalizes these ideas to infinite-dimensional spaces.
Failure Mode
The spectral theorem requires symmetry. For non-symmetric matrices, eigenvalues can be complex, eigenvectors need not be orthogonal, and the matrix may not even be diagonalizable. If you are working with a non-symmetric matrix (e.g., a non-reversible Markov chain transition matrix), you need the Jordan normal form or the singular value decomposition instead.
Connection to PCA
The link between eigenvalues and PCA is direct:
Given data points with sample covariance matrix :
- is symmetric and positive semidefinite
- By the spectral theorem:
- The eigenvector with the largest eigenvalue is the direction of maximum variance (the first principal component)
- The -th eigenvector (sorted by decreasing eigenvalue) is the -th principal component
- The eigenvalue equals the variance of the data projected onto the -th principal component
This is not a loose analogy. PCA is eigendecomposition of the covariance matrix. Understanding eigenvalues is understanding PCA.
Canonical Examples
Eigenvalues of a 2x2 matrix
Let .
The characteristic polynomial is:
Solving: .
So and .
Check: . And . Both eigenvalues are positive, so is positive definite. Since is symmetric, the eigenvectors are orthogonal.
Why eigenvalues control iteration convergence
Consider the iteration . If is diagonalizable with , then . Writing :
The term with the largest dominates. If for all , the iteration converges to zero. If any , it diverges. The spectral radius determines the asymptotic behavior.
This is why eigenvalues appear in every convergence analysis: gradient descent, power iteration, PageRank, Markov chain mixing.
Common Confusions
Eigenvalues can be complex even for real matrices
The matrix (a 90-degree rotation) has characteristic polynomial , giving eigenvalues . There are no real eigenvectors because a rotation has no direction that it merely scales. Complex eigenvalues indicate a rotational component. For symmetric matrices, this never happens --- all eigenvalues are real.
Eigenvectors are only unique up to scaling
If is an eigenvector of , so is for any nonzero scalar . When people say "the eigenvector," they usually mean a unit eigenvector (), but even then the sign is ambiguous ( and are both unit eigenvectors). This sign ambiguity appears in PCA: principal components are defined up to a sign flip.
Symmetric positive semidefinite is not the same as symmetric positive definite
A symmetric matrix is positive semidefinite () if all eigenvalues are . It is positive definite () if all eigenvalues are . The covariance matrix is always positive semidefinite. It is positive definite if and only if the data spans all dimensions (no perfect collinearity). The distinction matters: positive semidefiniteness means might have a nontrivial null space.
Summary
- Eigenvectors are directions that a matrix scales without rotating; eigenvalues are the scaling factors
- Characteristic polynomial gives eigenvalues
- Trace = sum of eigenvalues; determinant = product of eigenvalues
- Diagonalization makes matrix powers trivial
- Spectral theorem: symmetric matrices have real eigenvalues and orthogonal eigenvectors ()
- PCA = eigendecomposition of the covariance matrix
- Eigenvalues control convergence rates of iterative algorithms
Exercises
Problem
Find the eigenvalues and eigenvectors of . Verify that the eigenvectors are orthogonal and that the trace and determinant relations hold.
Problem
Let be a real symmetric matrix with eigenvalues . Prove that:
This is the Rayleigh quotient characterization. Why does this directly imply that the first principal component is the leading eigenvector of the covariance matrix?
References
Canonical:
- Strang, Linear Algebra and Its Applications (4th ed., 2006), Chapters 5-6
- Horn & Johnson, Matrix Analysis (2nd ed., 2012), Chapters 1-4
- Halmos, Finite-Dimensional Vector Spaces (1958), Chapters 6-7 (spectral theory)
Current:
- Boyd & Vandenberghe, Introduction to Applied Linear Algebra (2018), Chapter 10
- Axler, Linear Algebra Done Right (4th ed., 2024), Chapters 5, 7
- Trefethen & Bau, Numerical Linear Algebra (1997), Lectures 24-27 (eigenvalue algorithms and perturbation theory)
Next Topics
Building on eigenvalues and eigenvectors:
- Singular value decomposition: the generalization to non-square and non-symmetric matrices
- Principal component analysis: eigenvalues of the covariance matrix in action
- Conditioning and condition number: the ratio of extreme eigenvalues
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Matrix Operations and PropertiesLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
Builds on This
- Conditioning and Condition NumberLayer 1
- Dimensionality Reduction TheoryLayer 2
- Gram Matrices and Kernel MatricesLayer 1
- Graph Neural NetworksLayer 3
- Kalman FilterLayer 2
- Markov Chains and Steady StateLayer 1
- NMF (Nonnegative Matrix Factorization)Layer 2
- Numerical Linear AlgebraLayer 1
- Open Problems in Matrix ComputationLayer 3
- PageRank AlgorithmLayer 2
- Positive Semidefinite MatricesLayer 0A
- Principal Component AnalysisLayer 1
- Recommender SystemsLayer 2
- Riemannian Optimization and Manifold ConstraintsLayer 3
- Singular Value DecompositionLayer 0A
- Spectral ClusteringLayer 2
- Spectral Theory of OperatorsLayer 0B
- Tensors and Tensor OperationsLayer 0A
- Weight InitializationLayer 2
- Whitening and DecorrelationLayer 2