Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Foundations

Vectors, Matrices, and Linear Maps

Vector spaces, linear maps, matrix representation, rank, nullity, and the rank-nullity theorem. The algebraic backbone of ML.

CoreTier 1Stable~45 min

Why This Matters

Neural networks are compositions of linear maps and pointwise nonlinearities. PCA is an eigenvalue problem on a matrix. Gradient descent operates in vector spaces. Linear algebra is the computational substrate of modern ML.

Core Definitions

Definition

Vector Space

A vector space over a field F\mathbb{F} (typically R\mathbb{R}) is a set VV with operations +:V×VV+: V \times V \to V and :F×VV\cdot: \mathbb{F} \times V \to V satisfying: commutativity and associativity of addition, existence of additive identity 00 and inverses, distributivity, and 1v=v1 \cdot v = v. The elements of VV are vectors.

Definition

Linear Independence and Basis

Vectors v1,,vkVv_1, \ldots, v_k \in V are linearly independent if i=1kαivi=0\sum_{i=1}^k \alpha_i v_i = 0 implies all αi=0\alpha_i = 0. A basis is a maximal linearly independent set, equivalently a linearly independent set that spans VV. The dimension dim(V)\dim(V) is the number of elements in any basis.

Definition

Linear Map

A function T:VWT: V \to W between vector spaces is linear if T(αu+βv)=αT(u)+βT(v)T(\alpha u + \beta v) = \alpha T(u) + \beta T(v) for all scalars α,β\alpha, \beta and vectors u,vu, v. The kernel (null space) is ker(T)={vV:T(v)=0}\ker(T) = \{v \in V : T(v) = 0\}. The image (range) is im(T)={T(v):vV}\text{im}(T) = \{T(v) : v \in V\}.

Definition

Matrix Representation

Given bases {e1,,en}\{e_1, \ldots, e_n\} for VV and {f1,,fm}\{f_1, \ldots, f_m\} for WW, a linear map T:VWT: V \to W is represented by the m×nm \times n matrix AA where column jj contains the coordinates of T(ej)T(e_j) in the basis of WW. Matrix multiplication ABAB corresponds to composition of linear maps.

Definition

Rank and Nullity

The rank of a matrix AA is rank(A)=dim(im(A))\text{rank}(A) = \dim(\text{im}(A)), equivalently the number of linearly independent columns (or rows). The nullity is null(A)=dim(ker(A))\text{null}(A) = \dim(\ker(A)).

Definition

Change of Basis

If PP is the matrix whose columns are the new basis vectors expressed in the old basis, then the coordinates transform as xnew=P1xoldx_{\text{new}} = P^{-1} x_{\text{old}}. A linear map AA in the new basis becomes P1APP^{-1}AP.

Main Theorems

Theorem

Rank-Nullity Theorem

Statement

For any linear map T:VWT: V \to W with VV finite-dimensional:

dim(V)=rank(T)+null(T)\dim(V) = \text{rank}(T) + \text{null}(T)

Equivalently, for an m×nm \times n matrix AA: n=rank(A)+null(A)n = \text{rank}(A) + \text{null}(A).

Intuition

The domain VV splits into two complementary parts: the kernel (what TT kills) and a complement (what TT maps faithfully onto the image). Their dimensions must add up to dim(V)\dim(V).

Proof Sketch

Let {u1,,uk}\{u_1, \ldots, u_k\} be a basis for ker(T)\ker(T). Extend to a basis {u1,,uk,w1,,wr}\{u_1, \ldots, u_k, w_1, \ldots, w_r\} for VV. Show that {T(w1),,T(wr)}\{T(w_1), \ldots, T(w_r)\} is a basis for im(T)\text{im}(T): it spans because any T(v)T(v) can be written in terms of these (the uiu_i contribute nothing), and it is linearly independent because αiT(wi)=0    T(αiwi)=0    αiwiker(T)\sum \alpha_i T(w_i) = 0 \implies T(\sum \alpha_i w_i) = 0 \implies \sum \alpha_i w_i \in \ker(T), which forces all αi=0\alpha_i = 0.

Why It Matters

The rank-nullity theorem constrains the geometry of linear systems. A system Ax=bAx = b has a solution iff bim(A)b \in \text{im}(A), and the solution is unique iff ker(A)={0}\ker(A) = \{0\}. In ML: the rank of a data matrix determines how many independent features exist. The singular value decomposition provides a canonical way to compute rank numerically.

Failure Mode

Requires finite-dimensional VV. In infinite-dimensional spaces (e.g., function spaces in kernel methods), the statement needs modification. The index of an operator generalizes rank-nullity but involves subtleties about closedness of the range.

Common Confusions

Watch Out

Matrix multiplication is not commutative

ABBAAB \neq BA in general, even when both products are defined. This reflects the fact that composition of linear maps is not commutative. The order matters: ABAB means "apply BB first, then AA."

Watch Out

Rank equals column rank equals row rank

Column rank (dimension of column space) always equals row rank (dimension of row space). This is not obvious and requires proof. It means rank(A)=rank(AT)\text{rank}(A) = \text{rank}(A^T).

Canonical Examples

Example

Projection matrix

Let P=(1000)P = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} acting on R2\mathbb{R}^2. Then im(P)\text{im}(P) is the xx-axis, ker(P)\ker(P) is the yy-axis. Rank is 1, nullity is 1, and 1+1=2=dim(R2)1 + 1 = 2 = \dim(\mathbb{R}^2).

Exercises

ExerciseCore

Problem

Let AA be a 5×35 \times 3 matrix with rank(A)=2\text{rank}(A) = 2. What is the dimension of ker(A)\ker(A)? Can Ax=bAx = b have a unique solution?

ExerciseAdvanced

Problem

Prove that rank(AB)min(rank(A),rank(B))\text{rank}(AB) \leq \min(\text{rank}(A), \text{rank}(B)).

References

Canonical:

  • Axler, Linear Algebra Done Right (2024), Chapters 1-3
  • Strang, Introduction to Linear Algebra (2016), Chapters 1-4
  • Halmos, Finite-Dimensional Vector Spaces (1958), Chapters 1-3 (vector spaces and linear maps)

For ML context:

  • Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Chapter 2
  • Horn & Johnson, Matrix Analysis (2013), Chapter 0 (review of linear algebra fundamentals)
  • Boyd & Vandenberghe, Introduction to Applied Linear Algebra (2018), Chapters 1-6 (vectors, matrices, and linear independence)

Last reviewed: April 2026

Builds on This

Next Topics