Foundations
Vectors, Matrices, and Linear Maps
Vector spaces, linear maps, matrix representation, rank, nullity, and the rank-nullity theorem. The algebraic backbone of ML.
Why This Matters
Neural networks are compositions of linear maps and pointwise nonlinearities. PCA is an eigenvalue problem on a matrix. Gradient descent operates in vector spaces. Linear algebra is the computational substrate of modern ML.
Core Definitions
Vector Space
A vector space over a field (typically ) is a set with operations and satisfying: commutativity and associativity of addition, existence of additive identity and inverses, distributivity, and . The elements of are vectors.
Linear Independence and Basis
Vectors are linearly independent if implies all . A basis is a maximal linearly independent set, equivalently a linearly independent set that spans . The dimension is the number of elements in any basis.
Linear Map
A function between vector spaces is linear if for all scalars and vectors . The kernel (null space) is . The image (range) is .
Matrix Representation
Given bases for and for , a linear map is represented by the matrix where column contains the coordinates of in the basis of . Matrix multiplication corresponds to composition of linear maps.
Rank and Nullity
The rank of a matrix is , equivalently the number of linearly independent columns (or rows). The nullity is .
Change of Basis
If is the matrix whose columns are the new basis vectors expressed in the old basis, then the coordinates transform as . A linear map in the new basis becomes .
Main Theorems
Rank-Nullity Theorem
Statement
For any linear map with finite-dimensional:
Equivalently, for an matrix : .
Intuition
The domain splits into two complementary parts: the kernel (what kills) and a complement (what maps faithfully onto the image). Their dimensions must add up to .
Proof Sketch
Let be a basis for . Extend to a basis for . Show that is a basis for : it spans because any can be written in terms of these (the contribute nothing), and it is linearly independent because , which forces all .
Why It Matters
The rank-nullity theorem constrains the geometry of linear systems. A system has a solution iff , and the solution is unique iff . In ML: the rank of a data matrix determines how many independent features exist. The singular value decomposition provides a canonical way to compute rank numerically.
Failure Mode
Requires finite-dimensional . In infinite-dimensional spaces (e.g., function spaces in kernel methods), the statement needs modification. The index of an operator generalizes rank-nullity but involves subtleties about closedness of the range.
Common Confusions
Matrix multiplication is not commutative
in general, even when both products are defined. This reflects the fact that composition of linear maps is not commutative. The order matters: means "apply first, then ."
Rank equals column rank equals row rank
Column rank (dimension of column space) always equals row rank (dimension of row space). This is not obvious and requires proof. It means .
Canonical Examples
Projection matrix
Let acting on . Then is the -axis, is the -axis. Rank is 1, nullity is 1, and .
Exercises
Problem
Let be a matrix with . What is the dimension of ? Can have a unique solution?
Problem
Prove that .
References
Canonical:
- Axler, Linear Algebra Done Right (2024), Chapters 1-3
- Strang, Introduction to Linear Algebra (2016), Chapters 1-4
- Halmos, Finite-Dimensional Vector Spaces (1958), Chapters 1-3 (vector spaces and linear maps)
For ML context:
- Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Chapter 2
- Horn & Johnson, Matrix Analysis (2013), Chapter 0 (review of linear algebra fundamentals)
- Boyd & Vandenberghe, Introduction to Applied Linear Algebra (2018), Chapters 1-6 (vectors, matrices, and linear independence)
Last reviewed: April 2026
Builds on This
- AutoencodersLayer 2
- Convolutional Neural NetworksLayer 3
- Inner Product Spaces and OrthogonalityLayer 0A
- Matrix Multiplication AlgorithmsLayer 1
- Matrix NormsLayer 0A