Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Positive Semidefinite Matrices

PSD matrices: equivalent characterizations, Cholesky decomposition, Schur complement, and Loewner ordering. Covariance matrices are PSD. Hessians of convex functions are PSD. These facts connect linear algebra to optimization and statistics.

CoreTier 1Stable~40 min

Why This Matters

2.0
0.5
Minimum (positive definite)
xyv (λ₁=2.0)v (λ₂=0.5)f(x) f(0) + ½(λ₁x² + λ₂x²)Eigenvalues of the Hessian determine curvature along each eigenvector directionκ = λ₁/λ₂ = 4.0 (condition number)

Positive semidefiniteness is the matrix analogue of non-negativity for scalars. It appears in three central places in ML:

Covariance matrices are always PSD. If Σ\Sigma is a covariance matrix, then xTΣx=Var(xTZ)0x^T \Sigma x = \text{Var}(x^T Z) \geq 0 for any vector xx.

Hessians of convex functions are PSD. A twice-differentiable function ff is convex if and only if 2f(x)0\nabla^2 f(x) \succeq 0 everywhere. This connects PSD matrices to the theory of convex optimization.

Kernel matrices (Gram matrices) are PSD by construction. The entire theory of kernel methods rests on this fact.

Core Definitions

Definition

Positive Semidefinite Matrix

A symmetric matrix ARn×nA \in \mathbb{R}^{n \times n} is positive semidefinite (PSD) if:

xTAx0for all xRnx^T A x \geq 0 \quad \text{for all } x \in \mathbb{R}^n

If the inequality is strict for all x0x \neq 0, the matrix is positive definite (PD), written A0A \succ 0.

Definition

Loewner Ordering

For symmetric matrices AA and BB, the Loewner ordering is:

AB    AB0    xT(AB)x0 for all xA \succeq B \iff A - B \succeq 0 \iff x^T(A - B)x \geq 0 \text{ for all } x

This is a partial order on symmetric matrices. It is not a total order: most pairs of matrices are incomparable.

Main Theorems

Theorem

PSD Equivalence Theorem

Statement

The following are equivalent for a symmetric matrix ARn×nA \in \mathbb{R}^{n \times n}:

  1. A0A \succeq 0 (i.e., xTAx0x^T A x \geq 0 for all xx)
  2. All eigenvalues of AA are non-negative
  3. A=BTBA = B^T B for some matrix BB
  4. All principal minors of AA are non-negative
  5. There exists a lower triangular LL with non-negative diagonal such that A=LLTA = LL^T (Cholesky decomposition, if A0A \succ 0)

Intuition

PSD means the quadratic form xTAxx^T A x is a bowl that never dips below zero. Eigenvalues are the curvatures along the principal axes. Non-negative curvature in every direction means non-negative eigenvalues.

Proof Sketch

(12)(1 \Rightarrow 2): If Av=λvAv = \lambda v with v=1\|v\| = 1, then λ=vTAv0\lambda = v^T A v \geq 0. (23)(2 \Rightarrow 3): By the spectral theorem, A=QΛQTA = Q \Lambda Q^T where Λ\Lambda has non-negative entries. Set B=Λ1/2QTB = \Lambda^{1/2} Q^T. (31)(3 \Rightarrow 1): xTAx=xTBTBx=Bx20x^T A x = x^T B^T B x = \|Bx\|^2 \geq 0.

Why It Matters

Different characterizations are useful in different contexts. Checking eigenvalues is O(n3)O(n^3). Checking the quadratic form definition is useful for proofs. The factorization A=BTBA = B^T B is used in sampling from Gaussians: if ZN(0,I)Z \sim N(0, I) and Σ=LLT\Sigma = LL^T, then LZN(0,Σ)LZ \sim N(0, \Sigma).

Failure Mode

Symmetry is required. A non-symmetric matrix can satisfy xTAx0x^T A x \geq 0 for all xx without having real eigenvalues. The standard convention in ML is that PSD refers to symmetric (or Hermitian) matrices only.

Cholesky Decomposition

Every positive definite matrix AA has a unique factorization A=LLTA = LL^T where LL is lower triangular with positive diagonal entries. This is the Cholesky decomposition.

Computational cost: O(n3/3)O(n^3/3), which is half the cost of a general LU decomposition. It is the preferred method for solving linear systems Ax=bAx = b when AA is known to be PD: factor once, then solve Ly=bLy = b and LTx=yL^T x = y by back-substitution.

For PSD (not PD) matrices, the Cholesky factorization exists but LL may have zeros on the diagonal.

Schur Complement

Definition

Schur Complement

Given a block matrix:

M=(ABBTC)M = \begin{pmatrix} A & B \\ B^T & C \end{pmatrix}

where AA is invertible, the Schur complement of AA in MM is:

S=CBTA1BS = C - B^T A^{-1} B

The key fact: if A0A \succ 0, then M0M \succeq 0 if and only if S0S \succeq 0.

The Schur complement appears in conditional distributions: if (X,Y)N(0,M)(X, Y) \sim N(0, M), then the conditional covariance of YY given XX is the Schur complement of the XX-block.

Properties of PSD Matrices

  • Sum: if A0A \succeq 0 and B0B \succeq 0, then A+B0A + B \succeq 0
  • Scaling: if A0A \succeq 0 and α0\alpha \geq 0, then αA0\alpha A \succeq 0
  • Congruence: if A0A \succeq 0, then BABT0BAB^T \succeq 0 for any matrix BB
  • Trace: if A0A \succeq 0, then tr(A)0\text{tr}(A) \geq 0 (sum of eigenvalues)
  • Hadamard product: if A0A \succeq 0 and B0B \succeq 0, then AB0A \circ B \succeq 0 (Schur product theorem)

The set of PSD matrices forms a convex cone: it is closed under addition and non-negative scalar multiplication.

Common Confusions

Watch Out

PSD is about the symmetric part

For a non-symmetric matrix MM, the quadratic form xTMx=xTM+MT2xx^T M x = x^T \frac{M + M^T}{2} x. So the sign of the quadratic form depends only on the symmetric part (M+MT)/2(M + M^T)/2. When someone says "the Hessian is PSD," this is only meaningful because Hessians of twice-differentiable functions are symmetric.

Watch Out

Positive entries do not imply PSD

The matrix (1221)\begin{pmatrix} 1 & 2 \\ 2 & 1 \end{pmatrix} has all positive entries but eigenvalues 33 and 1-1, so it is not PSD. Conversely, the identity matrix is PSD despite having zeros off the diagonal.

Exercises

ExerciseCore

Problem

Prove that every covariance matrix is PSD. That is, if Σ=E[(Xμ)(Xμ)T]\Sigma = \mathbb{E}[(X - \mu)(X - \mu)^T], show Σ0\Sigma \succeq 0.

ExerciseAdvanced

Problem

Let A0A \succ 0 be n×nn \times n positive definite. Prove that A10A^{-1} \succ 0.

References

Canonical:

  • Horn & Johnson, Matrix Analysis (2013), Chapter 7
  • Strang, Linear Algebra and Its Applications (2006), Section 6.5
  • Bhatia, Positive Definite Matrices (2007), Chapters 1-2 (characterizations and Loewner ordering)

Current:

  • Boyd & Vandenberghe, Convex Optimization (2004), Section 2.3 (PSD cone)
  • Golub & Van Loan, Matrix Computations (2013), Chapter 4 (Cholesky decomposition)
  • Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Section 4.3 (positive definite matrices and Cholesky)

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics