Thin-Plate Splines

Sneiderman, Robby

Optimization Function Classes

Thin-Plate Splines

Smoothing splines in two and higher dimensions. Penalize integrated squared second-derivative magnitude across the surface; the minimizer is a sum of radial basis functions plus a low-degree polynomial. Green-Silverman 1994 and Wahba 1990 are the canonical references.

AdvancedAdvancedTier 1StableSupporting~50 min

For:MLStatsResearch

Prerequisites

Smoothing Splines Kernels and Rkhs Linear Regression Ridge Regression

Prereq Map

Why This Matters

Smoothing splines in one dimension penalize $\int (f'')^2$ and have a clean natural-cubic-spline minimizer. Generalizing to dimension $d \geq 2$ runs into a problem: a "second derivative" of a function on $\mathbb{R}^d$ is a matrix, and there is no single natural scalar to put inside the integral.

Thin-plate splines pick a specific scalar: the Frobenius norm of the Hessian, integrated over the input space, $J(f) = \int_{\mathbb{R}^d} \sum_{i, j} \left(\frac{\partial^2 f}{\partial x_i \partial x_j}\right)^2 dx.$ For $d = 2$ this is the "bending energy" of a thin metal plate deflecting under loads at the data points, hence the name. Duchon (1977) showed that the minimizer of $\sum (Y_i - f(X_i))^2 + \lambda J(f)$ over an appropriate Beppo Levi space is a finite sum of radial basis functions plus a low-degree polynomial. The representation has $n$ free parameters, just like the univariate smoothing spline.

ESL 2nd ed. §5.7 (pp. 162-167) introduces thin-plate splines as the canonical multidimensional smoother. Green and Silverman (1994) Ch 7 develops the full theory; Wahba (1990) Ch 2 gives the RKHS view.

Quick Version

Object	Form
Penalty in $d = 2$	$J(f) = \int \left((\partial_{xx} f)^2 + 2 (\partial_{xy} f)^2 + (\partial_{yy} f)^2\right) dx \, dy$
Radial basis function	$\eta(r) = r^2 \log r$ for $d = 2$ ; $\eta(r) = r^{2m - d}$ for higher $d$ with appropriate $m$
Solution	$\hat{f}(x) = \sum_{i=1}^n \alpha_i \, \eta(\\|x - X_i\\|) + \beta^\top \phi(x)$
Side conditions	$\sum_i \alpha_i p(X_i) = 0$ for each polynomial $p$ in the null space
Linear system size	$n + (\text{null-space dimension})$
Null space ( $d = 2$ , $m = 2$ )	constants, $x_1$ , $x_2$ (dimension 3)
Optimal $\lambda$	$\propto n^{-2m/(2m+d)}$ ; rate $n^{-2m/(2m+d)}$ in MSE

The function $\eta$ is the fundamental solution of the biharmonic equation in $d = 2$ (Green's function for $\Delta^2$ ). The basis at each $X_i$ is the response of a thin plate to a point load at $X_i$ .

Formal Setup

Definition

Bending Energy in 2D $J (f)$

For a function $f: \mathbb{R}^2 \to \mathbb{R}$ with square-integrable second derivatives, $J(f) = \int_{\mathbb{R}^2} \left[(\partial_{xx} f)^2 + 2 (\partial_{xy} f)^2 + (\partial_{yy} f)^2\right] dx \, dy.$ $J$ is invariant under rotations and translations of the coordinate system: rotating the input rotates the Hessian without changing the Frobenius norm. $J(f) = 0$ if and only if $f$ is affine, namely $f(x) = \beta_0 + \beta_1 x_1 + \beta_2 x_2$ .

Definition

Thin-Plate Spline

Given $(X_1, Y_1), \ldots, (X_n, Y_n)$ with $X_i \in \mathbb{R}^2$ and $\lambda > 0$ , the thin-plate spline is the minimizer of $\hat{f}_\lambda = \arg\min_{f} \sum_{i=1}^n (Y_i - f(X_i))^2 + \lambda J(f)$ over functions $f$ in the Beppo Levi space $BL_2(\mathbb{R}^2) = \{f : J(f) < \infty\}$ modulo the affine null space.

The Beppo Levi space is the natural domain: it identifies functions that differ by an affine function, since $J$ does not see affine perturbations. The minimizer is unique up to the affine null space; the data-fit term pins it down.

The Representer Theorem

Theorem

Thin-Plate Spline Representation (Duchon 1977)

Statement

The minimizer of $\sum_{i=1}^n (Y_i - f(X_i))^2 + \lambda J(f)$ over the appropriate Beppo Levi space has the form $\hat{f}_\lambda(x) = \sum_{i=1}^n \alpha_i \, \eta(\|x - X_i\|) + \sum_{j} \beta_j \, p_j(x),$ where $\{p_j\}$ is a basis for the null space of $J$ (polynomials of degree $< m$ ) and $\eta$ is the fundamental solution of the iterated Laplacian $\Delta^m$ :

$d = 2$ , $m = 2$ : $\eta(r) = r^2 \log r$
$d = 2$ , $m = 3$ : $\eta(r) = r^4 \log r$
$d = 3$ , $m = 2$ : $\eta(r) = r$ (with sign adjustments)
general $d$ and $m$ with $2m - d > 0$ and not even: $\eta(r) = r^{2m - d}$
general $d$ and $m$ with $2m - d$ even and nonnegative: $\eta(r) = r^{2m - d} \log r$ .

The coefficients $\alpha_i, \beta_j$ solve a linear system of size $n + \dim(\text{null space})$ . The side conditions $\sum_i \alpha_i p_j(X_i) = 0$ for each $p_j$ in the null-space basis ensure that the radial-basis part has no polynomial component to absorb.

Intuition

The penalty $J$ has a null space (the polynomial part) and a positive part (everything else). On the positive part, $J$ defines an RKHS norm, and the kernel is the Green's function of the differential operator $\Delta^m$ . The representer theorem in this RKHS then gives the radial basis expansion. The null-space polynomials add separately because the penalty does not penalize them.

The function $\eta(r) = r^2 \log r$ in 2D is the response of a thin plate to a point load: it blows up logarithmically and grows quadratically. The smoothing-spline solution at each data point reads off the strength of the load needed to produce the observed deflection.

Why It Matters

This is the unique generalization of the univariate cubic smoothing spline that respects rotation invariance and gives a tractable finite-dimensional solution. The representation extends to higher dimensions and to higher penalty orders ( $m$ ) without any new ideas. The only practical issue is that the linear system is dense (no banding), so the naive cost is $O(n^3)$ . Low-rank approximations bring this back to $O(n k^2)$ for some moderate $k$ .

Failure Mode

Three failure modes. (i) $2m \leq d$ : the penalty has insufficient smoothness to produce a well-defined function. For $d = 4$ you need $m \geq 3$ . (ii) Repeated $X_i$ : the linear system is singular. Pre-merge replicates. (iii) Inputs near a low-dimensional manifold: the radial basis matrix becomes near-singular and the coefficients $\alpha_i$ blow up. The numerical cure is either reduced-rank thin plates (Wood, 2003) or explicit regularization on $\alpha$ .

report a correction →

Optional ProofWhy r squared log r is the right radial basis in 2DShow

Green and Silverman (1994) Ch 7 and Wahba (1990) Ch 2 work this out.

The biharmonic operator $\Delta^2 = \Delta \Delta$ in 2D acts on smooth functions. Its fundamental solution $\eta$ satisfies $\Delta^2 \eta = \delta_0$ (the Dirac mass at origin) in the distributional sense. By rotation invariance, $\eta(x) = g(r)$ for $r = \|x\|$ , and the equation reduces to an ODE: $g''''(r) + (2/r) g'''(r) - (1/r^2) g''(r) + (1/r^3) g'(r) = (\text{factor of } \delta)$ .

Solving with the ansatz $g(r) = r^p$ gives $p (p - 2)^2 = 0$ , so the homogeneous solutions are $1, r^2, r^2 \log r, \log r$ . Subtracting the homogeneous parts and matching the delta-function source gives $\eta(r) = \frac{1}{8\pi} r^2 \log r$ as the fundamental solution. The normalization constant gets absorbed into the coefficients in the representer theorem; the functional form $r^2 \log r$ is the structural point.

The penalty $J(f)$ can be written as a quadratic form on the $\alpha_i$ (after using the representer theorem and integrating by parts): $J(\sum_i \alpha_i \eta(\| \cdot - X_i\|)) = \alpha^\top \boldsymbol{E} \alpha$ where $E_{ij} = \eta(\|X_i - X_j\|)$ . This is the basis-matrix-as-Gram-matrix identity that makes the representer theorem operational.

Implementation Notes

The straightforward implementation:

Build the $n \times n$ matrix $\boldsymbol{E}$ with $E_{ij} = \eta(\|X_i - X_j\|)$ .
Build the $n \times M$ null-space matrix $\boldsymbol{T}$ with rows $\phi(X_i)$ , where $\phi$ is a basis for null $(J)$ .
Solve the saddle-point system $\begin{pmatrix} \boldsymbol{E} + \lambda \boldsymbol{I} & \boldsymbol{T} \\ \boldsymbol{T}^\top & \boldsymbol{0} \end{pmatrix} \begin{pmatrix} \boldsymbol{\alpha} \\ \boldsymbol{\beta} \end{pmatrix} = \begin{pmatrix} \boldsymbol{Y} \\ \boldsymbol{0} \end{pmatrix}.$

The cost is $O(n^3)$ for the dense linear solve.

Low-rank approximation (Wood, 2003). Replace $\boldsymbol{E}$ with its rank- $k$ approximation $\boldsymbol{E} \approx \boldsymbol{U}_k \boldsymbol{D}_k \boldsymbol{U}_k^\top$ from the leading $k$ eigenvectors. The resulting "thin-plate regression spline" has $k$ parameters instead of $n$ and solves in $O(n k^2)$ . This is the default in mgcv::s(x1, x2, bs = "tp") in R. For typical applications $k = 50$ to $100$ gives accuracy indistinguishable from the full thin-plate solution.

Smoothing parameter. GCV from smoothing splines applies directly; the smoother matrix is dense rather than banded but the trace can be computed via the eigenvalues of $\boldsymbol{E}$ .

Higher Dimensions

For $d = 3$ with $m = 2$ , the fundamental solution of $\Delta^2$ is $\eta(r) = r$ (linear in the radius). For $d = 3$ , $m = 3$ it is $r^3$ . For $d = 4$ , $m = 3$ it is $r^2 \log r$ . The general formula is

$\eta(r) = \begin{cases} r^{2m - d} & \text{if } 2m - d \text{ is odd or zero} \\ r^{2m - d} \log r & \text{if } 2m - d \text{ is a positive even integer} \end{cases}$

with appropriate sign conventions. The null space is polynomials of degree $< m$ in $d$ variables, dimension $\binom{m + d - 1}{d}$ .

For high $d$ the null-space dimension grows polynomially and the estimator inherits the curse of dimensionality: MSE rate $O(n^{-2m/(2m+d)})$ degrades sharply with $d$ . By $d = 5$ to $7$ the estimator is largely useless without further structure (additivity, sparsity, low intrinsic dimension).

Canonical Example

Example

A geological surface from sparse measurements

Imagine $n = 80$ elevation measurements at irregularly placed survey points across a $10 \, \mathrm{km} \times 10 \, \mathrm{km}$ region. Fit a thin-plate spline with $m = 2$ to produce a smooth surface.

$\lambda$	$\mathrm{tr}(\boldsymbol{S}_\lambda)$	Visual outcome
very small	$\approx n$	interpolation; passes through every survey point, wild oscillation between
GCV-optimal	$\approx 25$	smooth surface; survey points slightly off the surface but visibly the right shape
very large	$3$	best-fit affine plane; loses topography

The GCV-optimal fit recovers the main ridge structure cleanly. The fitted surface satisfies $J(\hat{f}) \approx 0.3$ in physical units of inverse length squared, which is interpretable as "the surface is mostly flat with some moderate curvature near the ridge". The same data fit by ordinary kriging with a Matérn covariance gives a visually similar surface; the thin-plate spline is the limit of kriging with an improper "intrinsic stationary" prior.

Common Confusions

Watch Out

Thin-plate splines are not kernel ridge regression with a fixed kernel

The radial basis $\eta(r) = r^2 \log r$ is the Green's function of the differential operator, not a Mercer kernel. It is conditionally positive definite but not positive definite outright: it has $\dim(\text{null space})$ negative eigenvalues. The representer theorem still applies because the null-space part is added separately; the result is not a clean kernel ridge regression but the structure is analogous. ESL 2nd ed. p. 165 makes this distinction.

Watch Out

The bending-energy interpretation is for d = 2 specifically

"Thin-plate" is the $d = 2$ , $m = 2$ case where the penalty equals the elastic-energy of a deflected metal plate. The generalization to other $(d, m)$ is the same machinery but the physical interpretation breaks down. Use "thin-plate spline" loosely for any radial basis built on a Green's function of $\Delta^m$ in $\mathbb{R}^d$ ; the canonical case is $d = 2$ .

Watch Out

Tensor-product splines are a different choice in higher dimensions

Thin-plate splines are isotropic: rotation-invariant. Tensor-product splines build a basis as a product of one-dimensional B-spline bases along each coordinate. They are anisotropic and have $O(K^d)$ basis functions for $K$ knots per dimension. Thin-plate is the right choice when the data has no preferred direction; tensor-product is the right choice when the coordinates are heterogeneous (one is time, another is a spatial dimension, say) or when you want per-coordinate degrees of freedom.

Exercises

ExerciseCore

Problem

Verify that $J(f) = 0$ for $f(x_1, x_2) = a + b x_1 + c x_2$ . Hence confirm the null space of $J$ in 2D, $m = 2$ has dimension $3$ .

ExerciseAdvanced

Problem

Show that the side condition $\sum_i \alpha_i p(X_i) = 0$ for each polynomial $p$ in the null space follows from the requirement that the radial-basis part of $\hat{f}_\lambda$ has no polynomial component to absorb. Equivalently, that the representer theorem's representation is unique modulo the null space.

ExerciseResearch

Problem

For $d = 2$ , $m = 2$ with $n$ measurements on a regular grid of side $\sqrt{n}$ , the linear system has condition number that scales with $n$ . Estimate the rate and propose a preconditioner.

References

Canonical:

Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall. Ch 7 "Thin Plate Splines in Two Dimensions". The textbook treatment with full derivations.
Wahba, G. (1990). Spline Models for Observational Data. SIAM. Ch 2 "More General Reproducing Kernel Hilbert Spaces", Ch 3 "Equivalence and Perpendicularity, or, What's So Special About Splines?" The RKHS / Bayesian view.
Hastie, Tibshirani, Friedman. The Elements of Statistical Learning, 2nd ed. Springer (2009). Ch 5 "Basis Expansions and Regularization", §5.7 "Multidimensional Splines" (pp. 162-167). Concise statistical-learning summary.

Foundational:

Duchon, J. (1977). "Splines Minimizing Rotation-Invariant Semi-Norms in Sobolev Spaces." In Constructive Theory of Functions of Several Variables, Lecture Notes in Mathematics 571, Springer, 85-100. The original construction and the proof of the representer theorem in the Beppo Levi setting.
Meinguet, J. (1979). "Multivariate Interpolation at Arbitrary Points Made Simple." Journal of Applied Mathematics and Physics (ZAMP) 30(2), 292-304. The Green's-function derivation of the radial basis.

Low-rank and computation:

Wood, S. N. (2003). "Thin Plate Regression Splines." Journal of the Royal Statistical Society B 65(1), 95-114. The reduced-rank approximation used in mgcv and most modern implementations.
Wendland, H. (2004). Scattered Data Approximation. Cambridge. Numerical analysis of radial basis function methods.

Bayesian / geostatistical connection:

Cressie, N. (1993). Statistics for Spatial Data. Wiley. Thin-plate splines as kriging with an intrinsic stationary prior.

Next Topics

Smoothing splines: the univariate predecessor; thin-plate is the $d \geq 2$ generalization.
B-splines: the alternative basis for tensor-product multidimensional splines.
Gaussian processes regression: the Bayesian counterpart; thin-plate splines are the posterior mean under an improper "bending energy" prior.
Generalized additive models: per-coordinate smoothers as an alternative to thin-plate when interactions are not the target.

Last reviewed: May 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

4

Linear Regressionlayer 1 · tier 1
Ridge Regressionlayer 1 · tier 1
Smoothing Splineslayer 2 · tier 1
Kernels and Reproducing Kernel Hilbert Spaceslayer 3 · tier 2

Derived topics

0

No published topic currently declares this as a prerequisite.