Foundations
Linear Independence
A set of vectors is linearly independent if no vector is a redundant copy of the others. The concept underwrites basis, dimension, rank, the column-space test for $Ax = b$, and every overdetermined-versus-underdetermined diagnosis in linear regression and gradient-based optimization.
Prerequisites
Why This Matters
Half of every ML diagnostic of the form "is this system solvable / is this fit unique / are these features collinear / why is the optimiser stuck on a flat direction" reduces to a question about linear independence. The matrix has full column rank means its columns are linearly independent. The features are collinear means a column is a linear combination of the others. The Hessian is singular means its column space is missing a direction.
The concept is small but load-bearing. Skip it and the rest of linear algebra reads as notation rather than structure.
Mental Model
A set of vectors is linearly independent when none of them is a redundant copy of the others. Concretely: you cannot write any one vector as a weighted sum of the rest.
Geometrically in :
- One nonzero vector: independent (a line through the origin).
- Two vectors: independent if and only if neither is a scalar multiple of the other (they span a plane, not a line).
- Three vectors: independent if and only if they do not all lie in a common plane through the origin (they span all of ).
Algebraically, the test is the same in any dimension: the only linear combination that gives the zero vector is the all-zero combination.
Core Definitions
Linear Independence
A finite set is linearly independent if and only if the only solution to is . If any nonzero solution exists, the set is linearly dependent.
Linear Combination
A linear combination of with coefficients is the vector . The all-zero combination is the trivial combination and always yields ; the question is whether any non-trivial combination also yields .
Span
The span of is the set of all linear combinations: . Linear independence is the question of whether each contributes a new direction to this set or duplicates an existing one.
Equivalent Characterisations
The most useful theorem on this page is that several plausible-looking definitions of independence are all equivalent.
Equivalent Tests for Linear Independence
Statement
The following statements about a finite set are equivalent:
- The only solution to is for all .
- No is a linear combination of the other ().
- The matrix has .
- The matrix has rank .
- The Gram matrix is invertible.
Intuition
Each formulation looks at the same fact from a different angle. Statement 1 is the algebraic test. Statement 2 is the redundancy test. Statement 3 sees the vectors as columns of a matrix and asks about its null space. Statement 4 is a rank reformulation. Statement 5 is the symmetric form used in least-squares and Gram-matrix arguments.
Proof Sketch
. If some , then with and for is a nontrivial solution. Conversely, a nontrivial solution with some lets you isolate as a linear combination of the rest.
. Writing as the matrix product with , statement 1 says forces , which is exactly .
. By the rank-nullity theorem, , so iff .
. The matrix is and positive semidefinite. Its rank equals . So is invertible (full rank) iff .
Why It Matters
In practice, you reach for whichever form is cheapest to check. For a small concrete example, statement 1 is fastest. For a column of a larger matrix, statement 4 (rank) is what numerical libraries report. For the normal equations of least squares, statement 5 is the existence-of-unique-solution condition.
Failure Mode
Linear independence does not require orthogonality. Orthogonal nonzero vectors are independent, but independent vectors are typically not orthogonal. Confusing the two breaks down the moment you try to construct a basis for a subspace from independent but non-orthogonal columns and forget that you need Gram-Schmidt or QR to orthogonalise before relying on .
Worked Example: Three Vectors in R^3
Independent or dependent?
Let , , .
Set up component by component:
Back-substitution gives , then , then . The only solution is the trivial one, so the three vectors are linearly independent. The matrix is upper triangular with nonzero diagonal, hence invertible, hence rank 3, consistent with statement 4 of the equivalences theorem.
A dependent triple
Let , , .
Notice . Therefore is a nontrivial combination giving , so the set is linearly dependent. The redundancy is concentrated in the relationship between and ; is independent of separately, but independence is a property of the whole set, not of pairs.
Tightest Upper Bound on Independent Vectors
At Most n Independent Vectors in R^n
Statement
Any set of more than vectors in is linearly dependent. Equivalently, the maximum size of a linearly independent set in is .
Intuition
has dimension . A linearly independent set extends to a basis, and every basis has exactly elements. You cannot fit more than truly independent directions into an -dimensional space.
Proof Sketch
Stack any vectors as columns of an matrix . The matrix has only rows, so its rank is at most . By the rank-nullity theorem, , so the null space is non-trivial: there is a nonzero with , i.e. a non-trivial linear combination of the columns giving the zero vector.
Why It Matters
This is the headline feasibility check on most ML diagnostics. With data dimensions and features, your design matrix has linearly dependent columns by sheer counting; ridge regression and /elastic-net regularisation exist partly to tame this regime. With parameters and training points where , there are infinitely many parameter vectors that fit the data exactly: the classical underdetermined regime that motivates implicit-bias analyses of deep nets.
Failure Mode
The theorem requires as an -dimensional vector space; in infinite-dimensional spaces you can have arbitrarily large linearly independent sets (think Hermite or Fourier bases on a function space). For ML, this matters in kernel methods and RKHS-style arguments where the feature space is implicitly infinite.
Connection to Ax = b
Linear independence of the columns of is the uniqueness condition for . Writing , a solution to is exactly a recipe for assembling as a linear combination of columns:
- If the columns of are linearly independent, any solution is unique. Two distinct solutions would give with , contradicting independence.
- If the columns are linearly dependent, any solution comes with a whole affine subspace of solutions; you can add any null-space element of and still solve .
Solvability (does any solution exist?) is the column space question and is logically separate. Uniqueness is the column- independence question. Both must hold for the system to have exactly one solution.
Common Confusions
Independence is a property of a set, not a pair
Three vectors can be pairwise independent yet linearly dependent as a set. Example: , , in . Each pair is independent, but the third is a linear combination of the first two, so the whole set is dependent.
Independence does not imply orthogonality
and are independent but not orthogonal. To build an orthogonal (or orthonormal) basis from independent vectors you need Gram-Schmidt, QR, or a similar procedure. Independence alone does not give you the diagonalisation that orthonormal columns would.
Adding a vector to an independent set can keep it independent
A common worry: "if I keep adding vectors, eventually one will be a combination of the others." That happens once you exceed the ambient dimension , not before. Up to that point, you can add truly new directions and stay independent.
ML Implications
- Feature collinearity. Linearly dependent feature columns make the normal equations singular; closed-form least squares fails. Ridge regularisation rescues the problem by replacing with (always invertible for ).
- Overdetermined vs underdetermined fits. With samples and features, with full column rank gives a unique least-squares fit; leaves a whole subspace of perfect fits, and you need a regulariser, a prior, or implicit bias to pick one.
- Hessian singularity. A loss function whose Hessian has linearly dependent columns at the optimum has a flat valley in parameter space, so the optimum is not unique up to second order. This is the geometric content of "the model is overparameterised."
- Embedding rank. In representation learning, the rank of the matrix of learned embeddings is the number of linearly independent features the network has actually learned; rank collapse is a diagnosed failure mode in deep transformers.
Exercises
Problem
Are , , and linearly independent in ?
Problem
Let . Are the columns of linearly independent?
Problem
Suppose the columns of are linearly independent. Show that is invertible.
References
- Strang, Introduction to Linear Algebra, 5e (2016), Section 3.4. Standard undergraduate treatment with the column-space picture.
- Axler, Linear Algebra Done Right, 4e (2024), Chapter 2. Sections on span, linear independence, basis, and dimension developed without determinants.
- Hoffman & Kunze, Linear Algebra, 2e (1971), Chapter 2. Classic abstract treatment.
- Trefethen & Bau, Numerical Linear Algebra (1997), Lecture 1. Vectors, matrices, and the rank-1 sum picture, useful for connecting independence to the SVD geometry.
- Boyd & Vandenberghe, Introduction to Applied Linear Algebra (2018), Chapter 5 (Linear independence). Free PDF at stanford.edu/~boyd/vmls; emphasis on numerical examples and ML-relevant applications.
- Linear independence (Wikipedia): concise reference with the equivalent characterizations.
Related Topics
- Vectors, matrices, and linear maps
- Eigenvalues and eigenvectors
- Singular value decomposition
- Inner product spaces and orthogonality
Practice Drills
The LA foundations gold set includes two questions that test this page directly:
la-drill-linear-independence-011: definition testla-drill-Ax-b-uniqueness-014: the 0/1/infinite-solutions theorem via column independence
Last reviewed: May 1, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
1- Vectors, Matrices, and Linear Mapslayer 0A · tier 1
Derived topics
0No published topic currently declares this as a prerequisite.