Foundations
Positive Semidefinite Matrices
PSD matrices: equivalent characterizations, Cholesky decomposition, Schur complement, and Loewner ordering. Covariance matrices are PSD. Hessians of convex functions are PSD. These facts connect linear algebra to optimization and statistics.
Prerequisites
Why This Matters
Positive semidefiniteness is the matrix analogue of non-negativity for scalars. It appears in three central places in ML:
Covariance matrices are always PSD. If is a covariance matrix, then for any vector .
Hessians of convex functions are PSD. A twice-differentiable function is convex if and only if everywhere. This connects PSD matrices to the theory of convex optimization.
Kernel matrices (Gram matrices) are PSD by construction. The entire theory of kernel methods rests on this fact.
Core Definitions
Positive Semidefinite Matrix
A symmetric matrix is positive semidefinite (PSD) if:
If the inequality is strict for all , the matrix is positive definite (PD), written .
Loewner Ordering
For symmetric matrices and , the Loewner ordering is:
This is a partial order on symmetric matrices. It is not a total order: most pairs of matrices are incomparable.
Main Theorems
PSD Equivalence Theorem
Statement
The following are equivalent for a symmetric matrix :
- (i.e., for all )
- All eigenvalues of are non-negative
- for some matrix
- All principal minors of are non-negative
- There exists a lower triangular with non-negative diagonal such that (Cholesky decomposition, if )
Intuition
PSD means the quadratic form is a bowl that never dips below zero. Eigenvalues are the curvatures along the principal axes. Non-negative curvature in every direction means non-negative eigenvalues.
Proof Sketch
: If with , then . : By the spectral theorem, where has non-negative entries. Set . : .
Why It Matters
Different characterizations are useful in different contexts. Checking eigenvalues is . Checking the quadratic form definition is useful for proofs. The factorization is used in sampling from Gaussians: if and , then .
Failure Mode
Symmetry is required. A non-symmetric matrix can satisfy for all without having real eigenvalues. The standard convention in ML is that PSD refers to symmetric (or Hermitian) matrices only.
Cholesky Decomposition
Every positive definite matrix has a unique factorization where is lower triangular with positive diagonal entries. This is the Cholesky decomposition.
Computational cost: , which is half the cost of a general LU decomposition. It is the preferred method for solving linear systems when is known to be PD: factor once, then solve and by back-substitution.
For PSD (not PD) matrices, the Cholesky factorization exists but may have zeros on the diagonal.
Schur Complement
Schur Complement
Given a block matrix:
where is invertible, the Schur complement of in is:
The key fact: if , then if and only if .
The Schur complement appears in conditional distributions: if , then the conditional covariance of given is the Schur complement of the -block.
Properties of PSD Matrices
- Sum: if and , then
- Scaling: if and , then
- Congruence: if , then for any matrix
- Trace: if , then (sum of eigenvalues)
- Hadamard product: if and , then (Schur product theorem)
The set of PSD matrices forms a convex cone: it is closed under addition and non-negative scalar multiplication.
Common Confusions
PSD is about the symmetric part
For a non-symmetric matrix , the quadratic form . So the sign of the quadratic form depends only on the symmetric part . When someone says "the Hessian is PSD," this is only meaningful because Hessians of twice-differentiable functions are symmetric.
Positive entries do not imply PSD
The matrix has all positive entries but eigenvalues and , so it is not PSD. Conversely, the identity matrix is PSD despite having zeros off the diagonal.
Exercises
Problem
Prove that every covariance matrix is PSD. That is, if , show .
Problem
Let be positive definite. Prove that .
References
Canonical:
- Horn & Johnson, Matrix Analysis (2013), Chapter 7
- Strang, Linear Algebra and Its Applications (2006), Section 6.5
- Bhatia, Positive Definite Matrices (2007), Chapters 1-2 (characterizations and Loewner ordering)
Current:
- Boyd & Vandenberghe, Convex Optimization (2004), Section 2.3 (PSD cone)
- Golub & Van Loan, Matrix Computations (2013), Chapter 4 (Cholesky decomposition)
- Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Section 4.3 (positive definite matrices and Cholesky)
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Eigenvalues and EigenvectorsLayer 0A
- Matrix Operations and PropertiesLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A