Foundations
Inner Product Spaces and Orthogonality
Inner product axioms, Cauchy-Schwarz inequality, orthogonality, Gram-Schmidt, projections, and the bridge to Hilbert spaces.
Prerequisites
Why This Matters
Inner products define geometry in vector spaces: lengths, angles, and orthogonality. Kernel methods replace the standard dot product with arbitrary inner products to measure similarity in high-dimensional feature spaces. Projections onto subspaces underlie least squares linear regression, PCA, and conditional expectation.
Core Definitions
Inner Product
An inner product on a real vector space is a function satisfying:
- Symmetry:
- Linearity in first argument:
- Positive definiteness: with equality iff
The induced norm is . The standard inner product on is .
Orthogonality
Two vectors are orthogonal if , written . A set is orthonormal if (Kronecker delta). The orthogonal complement of a subspace is .
Orthogonal Projection
The orthogonal projection of onto a subspace is the unique minimizing over . If is an orthonormal basis for :
The residual lies in .
Hilbert Space (Preview)
A Hilbert space is a complete inner product space: every Cauchy sequence converges. Finite-dimensional inner product spaces are automatically Hilbert spaces. Infinite-dimensional examples include (the space of square-integrable functions) and reproducing kernel Hilbert spaces (RKHS) used in kernel methods.
Main Theorems
Cauchy-Schwarz Inequality
Statement
For all :
Equality holds if and only if and are linearly dependent.
Intuition
The cosine of the angle between two vectors has magnitude at most 1. In any inner product space, we can define , and Cauchy-Schwarz says this is well-defined.
Proof Sketch
For , define . Then and . Rearranging gives .
Why It Matters
Cauchy-Schwarz is the single most used inequality in analysis. It proves the triangle inequality for the induced norm. It bounds inner products (and hence correlations, cosine similarities, kernel evaluations) in terms of norms.
Failure Mode
The inequality is tight only when vectors are linearly dependent. For nearly orthogonal vectors, the bound is very loose. In high dimensions, random vectors tend to be nearly orthogonal, so the bound often has significant slack.
Gram-Schmidt Orthogonalization
Given linearly independent vectors , the Gram-Schmidt process produces an orthonormal set spanning the same subspace:
This is the constructive proof that every finite-dimensional inner product space has an orthonormal basis. In matrix form, Gram-Schmidt produces the QR decomposition , which is central to numerical matrix operations.
Common Confusions
Orthogonality depends on the inner product
Vectors that are orthogonal under the standard dot product may not be orthogonal under a different inner product. In ML, when using a Mahalanobis distance , the notion of orthogonality changes with .
Projection minimizes distance, not angle
The orthogonal projection is the closest point in to in the norm sense. It does not necessarily minimize the angle between and elements of .
Canonical Examples
Least squares as projection
Given with no exact solution, the least squares solution minimizes . This is equivalent to projecting onto the column space of . The projection satisfies the normal equations , which express the condition that the residual is orthogonal to the column space.
Exercises
Problem
Let and in with the standard inner product. Compute and verify that is orthogonal to .
Problem
Prove that in any inner product space, the induced norm satisfies the parallelogram law: . Show conversely that any norm satisfying the parallelogram law comes from an inner product.
References
Canonical:
- Axler, Linear Algebra Done Right (2024), Chapters 6-7
- Halmos, Finite-Dimensional Vector Spaces (1958), Chapters 1-2
- Strang, Linear Algebra and Its Applications (2006), Section 4.4 (orthogonality and projections)
For ML context:
- Deisenroth, Faisal, Ong, Mathematics for Machine Learning (2020), Chapter 3
- Horn & Johnson, Matrix Analysis (2013), Chapter 5 (norms and inner products)
- Trefethen & Bau, Numerical Linear Algebra (1997), Lectures 7-8 (Gram-Schmidt and QR decomposition)
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.