Skip to main content

Optimization Function Classes

Thin-Plate Splines

Smoothing splines in two and higher dimensions. Penalize integrated squared second-derivative magnitude across the surface; the minimizer is a sum of radial basis functions plus a low-degree polynomial. Green-Silverman 1994 and Wahba 1990 are the canonical references.

AdvancedAdvancedTier 1StableSupporting~50 min
For:MLStatsResearch

Why This Matters

Smoothing splines in one dimension penalize (f)2\int (f'')^2 and have a clean natural-cubic-spline minimizer. Generalizing to dimension d2d \geq 2 runs into a problem: a "second derivative" of a function on Rd\mathbb{R}^d is a matrix, and there is no single natural scalar to put inside the integral.

Thin-plate splines pick a specific scalar: the Frobenius norm of the Hessian, integrated over the input space, J(f)=Rdi,j(2fxixj)2dx.J(f) = \int_{\mathbb{R}^d} \sum_{i, j} \left(\frac{\partial^2 f}{\partial x_i \partial x_j}\right)^2 dx. For d=2d = 2 this is the "bending energy" of a thin metal plate deflecting under loads at the data points, hence the name. Duchon (1977) showed that the minimizer of (Yif(Xi))2+λJ(f)\sum (Y_i - f(X_i))^2 + \lambda J(f) over an appropriate Beppo Levi space is a finite sum of radial basis functions plus a low-degree polynomial. The representation has nn free parameters, just like the univariate smoothing spline.

ESL 2nd ed. §5.7 (pp. 162-167) introduces thin-plate splines as the canonical multidimensional smoother. Green and Silverman (1994) Ch 7 develops the full theory; Wahba (1990) Ch 2 gives the RKHS view.

Quick Version

ObjectForm
Penalty in d=2d = 2J(f)=((xxf)2+2(xyf)2+(yyf)2)dxdyJ(f) = \int \left((\partial_{xx} f)^2 + 2 (\partial_{xy} f)^2 + (\partial_{yy} f)^2\right) dx \, dy
Radial basis functionη(r)=r2logr\eta(r) = r^2 \log r for d=2d = 2; η(r)=r2md\eta(r) = r^{2m - d} for higher dd with appropriate mm
Solutionf^(x)=i=1nαiη(xXi)+βϕ(x)\hat{f}(x) = \sum_{i=1}^n \alpha_i \, \eta(\|x - X_i\|) + \beta^\top \phi(x)
Side conditionsiαip(Xi)=0\sum_i \alpha_i p(X_i) = 0 for each polynomial pp in the null space
Linear system sizen+(null-space dimension)n + (\text{null-space dimension})
Null space (d=2d = 2, m=2m = 2)constants, x1x_1, x2x_2 (dimension 3)
Optimal λ\lambdan2m/(2m+d)\propto n^{-2m/(2m+d)}; rate n2m/(2m+d)n^{-2m/(2m+d)} in MSE

The function η\eta is the fundamental solution of the biharmonic equation in d=2d = 2 (Green's function for Δ2\Delta^2). The basis at each XiX_i is the response of a thin plate to a point load at XiX_i.

Formal Setup

Definition

Bending Energy in 2D

For a function f:R2Rf: \mathbb{R}^2 \to \mathbb{R} with square-integrable second derivatives, J(f)=R2[(xxf)2+2(xyf)2+(yyf)2]dxdy.J(f) = \int_{\mathbb{R}^2} \left[(\partial_{xx} f)^2 + 2 (\partial_{xy} f)^2 + (\partial_{yy} f)^2\right] dx \, dy. JJ is invariant under rotations and translations of the coordinate system: rotating the input rotates the Hessian without changing the Frobenius norm. J(f)=0J(f) = 0 if and only if ff is affine, namely f(x)=β0+β1x1+β2x2f(x) = \beta_0 + \beta_1 x_1 + \beta_2 x_2.

Definition

Thin-Plate Spline

Given (X1,Y1),,(Xn,Yn)(X_1, Y_1), \ldots, (X_n, Y_n) with XiR2X_i \in \mathbb{R}^2 and λ>0\lambda > 0, the thin-plate spline is the minimizer of f^λ=argminfi=1n(Yif(Xi))2+λJ(f)\hat{f}_\lambda = \arg\min_{f} \sum_{i=1}^n (Y_i - f(X_i))^2 + \lambda J(f) over functions ff in the Beppo Levi space BL2(R2)={f:J(f)<}BL_2(\mathbb{R}^2) = \{f : J(f) < \infty\} modulo the affine null space.

The Beppo Levi space is the natural domain: it identifies functions that differ by an affine function, since JJ does not see affine perturbations. The minimizer is unique up to the affine null space; the data-fit term pins it down.

The Representer Theorem

Theorem

Thin-Plate Spline Representation (Duchon 1977)

Statement

The minimizer of i=1n(Yif(Xi))2+λJ(f)\sum_{i=1}^n (Y_i - f(X_i))^2 + \lambda J(f) over the appropriate Beppo Levi space has the form f^λ(x)=i=1nαiη(xXi)+jβjpj(x),\hat{f}_\lambda(x) = \sum_{i=1}^n \alpha_i \, \eta(\|x - X_i\|) + \sum_{j} \beta_j \, p_j(x), where {pj}\{p_j\} is a basis for the null space of JJ (polynomials of degree <m< m) and η\eta is the fundamental solution of the iterated Laplacian Δm\Delta^m:

  • d=2d = 2, m=2m = 2: η(r)=r2logr\eta(r) = r^2 \log r
  • d=2d = 2, m=3m = 3: η(r)=r4logr\eta(r) = r^4 \log r
  • d=3d = 3, m=2m = 2: η(r)=r\eta(r) = r (with sign adjustments)
  • general dd and mm with 2md>02m - d > 0 and not even: η(r)=r2md\eta(r) = r^{2m - d}
  • general dd and mm with 2md2m - d even and nonnegative: η(r)=r2mdlogr\eta(r) = r^{2m - d} \log r.

The coefficients αi,βj\alpha_i, \beta_j solve a linear system of size n+dim(null space)n + \dim(\text{null space}). The side conditions iαipj(Xi)=0\sum_i \alpha_i p_j(X_i) = 0 for each pjp_j in the null-space basis ensure that the radial-basis part has no polynomial component to absorb.

Intuition

The penalty JJ has a null space (the polynomial part) and a positive part (everything else). On the positive part, JJ defines an RKHS norm, and the kernel is the Green's function of the differential operator Δm\Delta^m. The representer theorem in this RKHS then gives the radial basis expansion. The null-space polynomials add separately because the penalty does not penalize them.

The function η(r)=r2logr\eta(r) = r^2 \log r in 2D is the response of a thin plate to a point load: it blows up logarithmically and grows quadratically. The smoothing-spline solution at each data point reads off the strength of the load needed to produce the observed deflection.

Why It Matters

This is the unique generalization of the univariate cubic smoothing spline that respects rotation invariance and gives a tractable finite-dimensional solution. The representation extends to higher dimensions and to higher penalty orders (mm) without any new ideas. The only practical issue is that the linear system is dense (no banding), so the naive cost is O(n3)O(n^3). Low-rank approximations bring this back to O(nk2)O(n k^2) for some moderate kk.

Failure Mode

Three failure modes. (i) 2md2m \leq d: the penalty has insufficient smoothness to produce a well-defined function. For d=4d = 4 you need m3m \geq 3. (ii) Repeated XiX_i: the linear system is singular. Pre-merge replicates. (iii) Inputs near a low-dimensional manifold: the radial basis matrix becomes near-singular and the coefficients αi\alpha_i blow up. The numerical cure is either reduced-rank thin plates (Wood, 2003) or explicit regularization on α\alpha.

Optional ProofWhy r squared log r is the right radial basis in 2DShow

Green and Silverman (1994) Ch 7 and Wahba (1990) Ch 2 work this out.

The biharmonic operator Δ2=ΔΔ\Delta^2 = \Delta \Delta in 2D acts on smooth functions. Its fundamental solution η\eta satisfies Δ2η=δ0\Delta^2 \eta = \delta_0 (the Dirac mass at origin) in the distributional sense. By rotation invariance, η(x)=g(r)\eta(x) = g(r) for r=xr = \|x\|, and the equation reduces to an ODE: g(r)+(2/r)g(r)(1/r2)g(r)+(1/r3)g(r)=(factor of δ)g''''(r) + (2/r) g'''(r) - (1/r^2) g''(r) + (1/r^3) g'(r) = (\text{factor of } \delta).

Solving with the ansatz g(r)=rpg(r) = r^p gives p(p2)2=0p (p - 2)^2 = 0, so the homogeneous solutions are 1,r2,r2logr,logr1, r^2, r^2 \log r, \log r. Subtracting the homogeneous parts and matching the delta-function source gives η(r)=18πr2logr\eta(r) = \frac{1}{8\pi} r^2 \log r as the fundamental solution. The normalization constant gets absorbed into the coefficients in the representer theorem; the functional form r2logrr^2 \log r is the structural point.

The penalty J(f)J(f) can be written as a quadratic form on the αi\alpha_i (after using the representer theorem and integrating by parts): J(iαiη(Xi))=αEαJ(\sum_i \alpha_i \eta(\| \cdot - X_i\|)) = \alpha^\top \boldsymbol{E} \alpha where Eij=η(XiXj)E_{ij} = \eta(\|X_i - X_j\|). This is the basis-matrix-as-Gram-matrix identity that makes the representer theorem operational.

Implementation Notes

The straightforward implementation:

  1. Build the n×nn \times n matrix E\boldsymbol{E} with Eij=η(XiXj)E_{ij} = \eta(\|X_i - X_j\|).
  2. Build the n×Mn \times M null-space matrix T\boldsymbol{T} with rows ϕ(Xi)\phi(X_i), where ϕ\phi is a basis for null(J)(J).
  3. Solve the saddle-point system (E+λITT0)(αβ)=(Y0).\begin{pmatrix} \boldsymbol{E} + \lambda \boldsymbol{I} & \boldsymbol{T} \\ \boldsymbol{T}^\top & \boldsymbol{0} \end{pmatrix} \begin{pmatrix} \boldsymbol{\alpha} \\ \boldsymbol{\beta} \end{pmatrix} = \begin{pmatrix} \boldsymbol{Y} \\ \boldsymbol{0} \end{pmatrix}.

The cost is O(n3)O(n^3) for the dense linear solve.

Low-rank approximation (Wood, 2003). Replace E\boldsymbol{E} with its rank-kk approximation EUkDkUk\boldsymbol{E} \approx \boldsymbol{U}_k \boldsymbol{D}_k \boldsymbol{U}_k^\top from the leading kk eigenvectors. The resulting "thin-plate regression spline" has kk parameters instead of nn and solves in O(nk2)O(n k^2). This is the default in mgcv::s(x1, x2, bs = "tp") in R. For typical applications k=50k = 50 to 100100 gives accuracy indistinguishable from the full thin-plate solution.

Smoothing parameter. GCV from smoothing splines applies directly; the smoother matrix is dense rather than banded but the trace can be computed via the eigenvalues of E\boldsymbol{E}.

Higher Dimensions

For d=3d = 3 with m=2m = 2, the fundamental solution of Δ2\Delta^2 is η(r)=r\eta(r) = r (linear in the radius). For d=3d = 3, m=3m = 3 it is r3r^3. For d=4d = 4, m=3m = 3 it is r2logrr^2 \log r. The general formula is

η(r)={r2mdif 2md is odd or zeror2mdlogrif 2md is a positive even integer\eta(r) = \begin{cases} r^{2m - d} & \text{if } 2m - d \text{ is odd or zero} \\ r^{2m - d} \log r & \text{if } 2m - d \text{ is a positive even integer} \end{cases}

with appropriate sign conventions. The null space is polynomials of degree <m< m in dd variables, dimension (m+d1d)\binom{m + d - 1}{d}.

For high dd the null-space dimension grows polynomially and the estimator inherits the curse of dimensionality: MSE rate O(n2m/(2m+d))O(n^{-2m/(2m+d)}) degrades sharply with dd. By d=5d = 5 to 77 the estimator is largely useless without further structure (additivity, sparsity, low intrinsic dimension).

Canonical Example

Example

A geological surface from sparse measurements

Imagine n=80n = 80 elevation measurements at irregularly placed survey points across a 10km×10km10 \, \mathrm{km} \times 10 \, \mathrm{km} region. Fit a thin-plate spline with m=2m = 2 to produce a smooth surface.

λ\lambdatr(Sλ)\mathrm{tr}(\boldsymbol{S}_\lambda)Visual outcome
very smalln\approx ninterpolation; passes through every survey point, wild oscillation between
GCV-optimal25\approx 25smooth surface; survey points slightly off the surface but visibly the right shape
very large33best-fit affine plane; loses topography

The GCV-optimal fit recovers the main ridge structure cleanly. The fitted surface satisfies J(f^)0.3J(\hat{f}) \approx 0.3 in physical units of inverse length squared, which is interpretable as "the surface is mostly flat with some moderate curvature near the ridge". The same data fit by ordinary kriging with a Matérn covariance gives a visually similar surface; the thin-plate spline is the limit of kriging with an improper "intrinsic stationary" prior.

Common Confusions

Watch Out

Thin-plate splines are not kernel ridge regression with a fixed kernel

The radial basis η(r)=r2logr\eta(r) = r^2 \log r is the Green's function of the differential operator, not a Mercer kernel. It is conditionally positive definite but not positive definite outright: it has dim(null space)\dim(\text{null space}) negative eigenvalues. The representer theorem still applies because the null-space part is added separately; the result is not a clean kernel ridge regression but the structure is analogous. ESL 2nd ed. p. 165 makes this distinction.

Watch Out

The bending-energy interpretation is for d = 2 specifically

"Thin-plate" is the d=2d = 2, m=2m = 2 case where the penalty equals the elastic-energy of a deflected metal plate. The generalization to other (d,m)(d, m) is the same machinery but the physical interpretation breaks down. Use "thin-plate spline" loosely for any radial basis built on a Green's function of Δm\Delta^m in Rd\mathbb{R}^d; the canonical case is d=2d = 2.

Watch Out

Tensor-product splines are a different choice in higher dimensions

Thin-plate splines are isotropic: rotation-invariant. Tensor-product splines build a basis as a product of one-dimensional B-spline bases along each coordinate. They are anisotropic and have O(Kd)O(K^d) basis functions for KK knots per dimension. Thin-plate is the right choice when the data has no preferred direction; tensor-product is the right choice when the coordinates are heterogeneous (one is time, another is a spatial dimension, say) or when you want per-coordinate degrees of freedom.

Exercises

ExerciseCore

Problem

Verify that J(f)=0J(f) = 0 for f(x1,x2)=a+bx1+cx2f(x_1, x_2) = a + b x_1 + c x_2. Hence confirm the null space of JJ in 2D, m=2m = 2 has dimension 33.

ExerciseAdvanced

Problem

Show that the side condition iαip(Xi)=0\sum_i \alpha_i p(X_i) = 0 for each polynomial pp in the null space follows from the requirement that the radial-basis part of f^λ\hat{f}_\lambda has no polynomial component to absorb. Equivalently, that the representer theorem's representation is unique modulo the null space.

ExerciseResearch

Problem

For d=2d = 2, m=2m = 2 with nn measurements on a regular grid of side n\sqrt{n}, the linear system has condition number that scales with nn. Estimate the rate and propose a preconditioner.

References

Canonical:

  • Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized Linear Models. Chapman and Hall. Ch 7 "Thin Plate Splines in Two Dimensions". The textbook treatment with full derivations.
  • Wahba, G. (1990). Spline Models for Observational Data. SIAM. Ch 2 "More General Reproducing Kernel Hilbert Spaces", Ch 3 "Equivalence and Perpendicularity, or, What's So Special About Splines?" The RKHS / Bayesian view.
  • Hastie, Tibshirani, Friedman. The Elements of Statistical Learning, 2nd ed. Springer (2009). Ch 5 "Basis Expansions and Regularization", §5.7 "Multidimensional Splines" (pp. 162-167). Concise statistical-learning summary.

Foundational:

  • Duchon, J. (1977). "Splines Minimizing Rotation-Invariant Semi-Norms in Sobolev Spaces." In Constructive Theory of Functions of Several Variables, Lecture Notes in Mathematics 571, Springer, 85-100. The original construction and the proof of the representer theorem in the Beppo Levi setting.
  • Meinguet, J. (1979). "Multivariate Interpolation at Arbitrary Points Made Simple." Journal of Applied Mathematics and Physics (ZAMP) 30(2), 292-304. The Green's-function derivation of the radial basis.

Low-rank and computation:

  • Wood, S. N. (2003). "Thin Plate Regression Splines." Journal of the Royal Statistical Society B 65(1), 95-114. The reduced-rank approximation used in mgcv and most modern implementations.
  • Wendland, H. (2004). Scattered Data Approximation. Cambridge. Numerical analysis of radial basis function methods.

Bayesian / geostatistical connection:

  • Cressie, N. (1993). Statistics for Spatial Data. Wiley. Thin-plate splines as kriging with an intrinsic stationary prior.

Next Topics

  • Smoothing splines: the univariate predecessor; thin-plate is the d2d \geq 2 generalization.
  • B-splines: the alternative basis for tensor-product multidimensional splines.
  • Gaussian processes regression: the Bayesian counterpart; thin-plate splines are the posterior mean under an improper "bending energy" prior.
  • Generalized additive models: per-coordinate smoothers as an alternative to thin-plate when interactions are not the target.

Last reviewed: May 13, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Required prerequisites

4

Derived topics

0

No published topic currently declares this as a prerequisite.