Foundations
Matrix Operations and Properties
Essential matrix operations for ML: trace, determinant, inverse, pseudoinverse, Schur complement, and the Sherman-Morrison-Woodbury formula. When and why each matters.
Prerequisites
Why This Matters
Matrices are the language of ML. Weight matrices, covariance matrices, kernel matrices, Hessians. They are everywhere. You need to know what operations you can perform on them, what those operations mean, and when they are numerically safe.
This page covers the operations that appear repeatedly in ML theory and practice.
Mental Model
Think of a matrix as a linear map that transforms vectors. The properties of that map. how it scales space, whether it is invertible, how it relates to its eigenvalues. are captured by operations like trace, determinant, and inverse. Each operation answers a different question about the map.
Trace
Trace
The trace of a square matrix is the sum of its diagonal elements:
Equivalently, the trace equals the sum of eigenvalues: .
Key properties of the trace:
- Linearity: and
- Cyclic property:
- Transpose invariance:
The cyclic property is extremely useful. It lets you rearrange matrix products inside a trace, which simplifies many derivations in ML (e.g., computing gradients of matrix expressions).
Determinant
Determinant
The determinant of a square matrix equals the product of its eigenvalues:
Geometrically, measures the factor by which scales volumes. If , the matrix is singular (not invertible).
Key properties:
- for
In ML, determinants appear in Gaussian distributions (the normalization constant involves ), in volume arguments for information theory, and in Bayesian model selection.
Matrix Inverse
Matrix Inverse
For a square matrix , the inverse satisfies . It exists if and only if (equivalently, all eigenvalues are nonzero).
Key identities:
- (note the reversed order)
When Inversion is Dangerous
Condition Number
The condition number of a matrix is:
where and are the largest and smallest singular values. A large condition number means the matrix is nearly singular and inversion is numerically unstable.
Rule of thumb: if , you lose about digits of accuracy when solving by inversion. For double precision (about 16 digits), is dangerous.
In practice, avoid explicit matrix inversion. Use factorizations (Cholesky, LU, QR) to solve linear systems instead.
Moore-Penrose Pseudoinverse
Moore-Penrose Pseudoinverse
The Moore-Penrose pseudoinverse of a matrix is the unique matrix satisfying four conditions: (1) , (2) , (3) , (4) .
For full column rank ():
For full row rank ():
The pseudoinverse gives the least-squares solution to when is not square or not invertible: minimizes .
This is exactly what happens in linear regression: the OLS solution is .
Transpose and Adjoint
The transpose swaps rows and columns: .
The conjugate transpose (adjoint) also conjugates complex entries: . For real matrices, .
A matrix is symmetric if . A matrix is orthogonal if . These properties simplify many computations. Symmetric matrices have real eigenvalues, and orthogonal matrices preserve lengths. A symmetric matrix with all nonnegative eigenvalues is positive semidefinite.
Schur Complement
Schur Complement
Given a block matrix:
If is invertible, the Schur complement of in is:
The determinant factors as .
The Schur complement appears in:
- Gaussian conditioning (deriving conditional distributions from joint)
- Block matrix inversion
- Optimization (eliminating variables in quadratic forms)
Sherman-Morrison-Woodbury Formula
Sherman-Morrison-Woodbury Formula
Statement
If is invertible, , is invertible, and , then:
Intuition
When you add a low-rank update to a matrix whose inverse you already know, you can compute the new inverse by solving a smaller system instead of a full inversion. This is a huge saving when .
Proof Sketch
Multiply by the proposed right-hand side and verify you get . This is a direct algebraic verification. expand the product and simplify using and .
Why It Matters
This formula appears throughout ML: online learning (rank-1 updates to covariance matrices), Kalman filters, Gaussian process inference with structured kernels, and Bayesian linear regression. Any time you have and need , use this formula.
Failure Mode
The formula requires both and to be invertible. If either is singular (or nearly so), the formula is inapplicable or numerically unstable.
The special case with (rank-1 update) is the Sherman-Morrison formula:
Common Confusions
Trace and determinant serve different purposes
The trace sums eigenvalues; the determinant multiplies them. A matrix can have large trace but zero determinant (e.g., ). The trace tells you about the "total magnitude" of eigenvalues; the determinant tells you whether the matrix is invertible and how it scales volume.
Never invert matrices explicitly
In nearly all practical ML code, you should solve using a linear
system solver (e.g., np.linalg.solve), not by computing and then
multiplying. Direct inversion is slower, less numerically stable, and rarely
necessary.
Summary
- = sum of eigenvalues; use the cyclic property freely
- = product of eigenvalues; zero means singular
- The pseudoinverse gives the least-squares solution to
- Condition number measures how dangerous inversion is
- Schur complements factor block matrices and appear in Gaussian conditioning
- Sherman-Morrison-Woodbury turns rank- updates into solves
- In practice, solve linear systems instead of inverting matrices
Exercises
Problem
Let . Compute , , , and (using the 2-norm).
Problem
Show that for any matrices and .
Problem
You have computed for a matrix. A new data point arrives, requiring you to compute where . What is the computational cost using Sherman-Morrison versus recomputing the inverse from scratch?
References
Canonical:
- Strang, Introduction to Linear Algebra (2016), Chapters 2, 5, 6
- Horn & Johnson, Matrix Analysis (2013), Chapters 0-1
- Golub & Van Loan, Matrix Computations (2013), Chapters 2-3 (matrix operations and LU/QR factorization)
Current:
- Petersen & Pedersen, The Matrix Cookbook (2012). Essential reference for matrix identities.
- Axler, Linear Algebra Done Right (2024), Chapters 3-4 (linear maps, invertibility, determinants)
- Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Chapter 4 (matrix decompositions)
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
Builds on This
- Attention Mechanism TheoryLayer 4
- Conditioning and Condition NumberLayer 1
- Convex Optimization BasicsLayer 1
- Eigenvalues and EigenvectorsLayer 0A
- Linear RegressionLayer 1
- Numerical Linear AlgebraLayer 1
- Numerical Stability and ConditioningLayer 1