Fast Fourier Transform

Sneiderman, Robby

Algorithms Foundations

Fast Fourier Transform

The Cooley-Tukey FFT reduces the discrete Fourier transform from O(n²) to O(n log n), enabling efficient convolution, spectral methods, and Fourier features for kernel approximation.

CoreTier 2StableSupporting~50 min

Prerequisites

Exponential Function Properties Complex Numbers for Fourier

Start 8-question practice · 3 available Prereq Map

Learning position

Read this page in the graph.

algorithms-foundations | layer 1 | tier 2. This page has 2 direct prerequisites and 4 published dependents.

Open Atlas Prerequisites Leads to

What next

Attention Variants and Efficiency

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

The FFT is one of the most important algorithms in all of computational science. For ML specifically: convolutions in CNNs can be computed via FFT, spectral clustering uses eigenvalues of matrices constructed with Fourier-like tools, random Fourier features approximate kernel functions by sampling from the Fourier transform of the kernel, and some efficient attention mechanisms replace quadratic dot-product attention with FFT-based operations.

Infographic on the Fast Fourier Transform: divide-and-conquer recurrence, butterfly diagram, complexity O(n log n) vs naive O(n²), inverse FFT, and applications (convolution acceleration, signal processing, FNO neural operators, physics-informed networks). — FFT is the canonical example of a divide-and-conquer algorithm that turns an O(n²) computation into O(n log n).

The FFT takes $O(n^2)$ and makes it $O(n \log n)$ . Any time you see convolution, correlation, or spectral analysis, the FFT is doing the work underneath.

The Discrete Fourier Transform

Definition

Discrete Fourier Transform $X = DFT (x)$

Given a sequence $x_0, x_1, \ldots, x_{n-1} \in \mathbb{C}$ , the DFT produces a sequence $X_0, X_1, \ldots, X_{n-1}$ defined by:

$X_k = \sum_{j=0}^{n-1} x_j \, e^{-2\pi i \, jk/n}, \quad k = 0, 1, \ldots, n-1$

The inverse DFT recovers $x$ from $X$ :

$x_j = \frac{1}{n} \sum_{k=0}^{n-1} X_k \, e^{2\pi i \, jk/n}$

The DFT decomposes a signal into its frequency components. Each $X_k$ measures how much the signal oscillates at frequency $k/n$ . The naive computation of all $n$ values of $X_k$ requires $n$ multiplications each, totaling $O(n^2)$ .

Definition

Roots of Unity $ω_{n} = e^{- 2 π i / n}$

The $n$ -th root of unity is $\omega_n = e^{-2\pi i/n}$ . The DFT can be written as $X_k = \sum_{j=0}^{n-1} x_j \omega_n^{jk}$ . The key algebraic property: $\omega_n^n = 1$ and $\omega_n^{n/2} = -1$ (when $n$ is even). These symmetries are what make the FFT possible.

The Cooley-Tukey FFT

The idea: split the DFT into two half-sized DFTs by separating even-indexed and odd-indexed terms.

Write $x_j$ with even indices $j = 2m$ and odd indices $j = 2m+1$ :

$X_k = \sum_{m=0}^{n/2-1} x_{2m} \, \omega_n^{2mk} + \omega_n^k \sum_{m=0}^{n/2-1} x_{2m+1} \, \omega_n^{2mk}$

Since $\omega_n^2 = \omega_{n/2}$ , each sum is a DFT of size $n/2$ . Let $E_k$ be the DFT of the even-indexed elements and $O_k$ the DFT of the odd-indexed elements. Then:

$X_k = E_k + \omega_n^k O_k, \quad X_{k+n/2} = E_k - \omega_n^k O_k$

This is the butterfly operation. Two DFTs of size $n/2$ plus $O(n)$ combining work.

Theorem

FFT Complexity

Statement

The Cooley-Tukey FFT computes the DFT of $n$ complex numbers in $O(n \log n)$ arithmetic operations.

Intuition

Each level of recursion halves the problem size and does $O(n)$ work for the butterfly combines. There are $\log_2 n$ levels, so total work is $O(n \log n)$ .

Proof Sketch

The recurrence is $T(n) = 2T(n/2) + O(n)$ . By the Master theorem with $a = 2$ , $b = 2$ , $d = 1$ : since $\log_2 2 = 1 = d$ , we get $T(n) = O(n \log n)$ .

Why It Matters

Reducing $O(n^2)$ to $O(n \log n)$ is the difference between feasible and infeasible for large signals. For $n = 10^6$ , this is a factor of roughly $50{,}000$ . Without the FFT, digital signal processing, medical imaging, and spectral methods in ML would be impractical.

Failure Mode

The standard radix-2 Cooley-Tukey requires $n$ to be a power of 2. For other sizes, you can pad with zeros (which changes the implicit periodicity) or use mixed-radix FFT variants. Numerical precision also degrades slightly with very large $n$ due to accumulation of floating-point errors in the twiddle factors $\omega_n^k$ .

report a correction →

The Convolution Theorem

This is the reason the FFT is so important for ML.

Definition

Circular Convolution $(x * y)_{k}$

The circular convolution of two length- $n$ sequences $x$ and $y$ is:

$(x * y)_k = \sum_{j=0}^{n-1} x_j \, y_{(k-j) \bmod n}$

Naive computation: $O(n^2)$ .

Theorem

Convolution Theorem

Statement

Let $X = \text{DFT}(x)$ and $Y = \text{DFT}(y)$ . Then:

$\text{DFT}(x * y) = X \odot Y$

where $\odot$ denotes pointwise multiplication. Equivalently, $x * y = \text{IDFT}(X \odot Y)$ .

Intuition

Convolution in the time domain becomes pointwise multiplication in the frequency domain. This converts an $O(n^2)$ operation into three $O(n \log n)$ FFTs plus $O(n)$ pointwise multiplications.

Proof Sketch

Direct computation. Write out $\text{DFT}(x * y)_k$ , substitute the convolution definition, swap the order of summation, and recognize the inner sum as $X_k \cdot Y_k$ .

Why It Matters

This is the workhorse behind efficient convolution in signal processing and CNNs. Any convolution with a filter of length $m$ applied to a signal of length $n$ can be done in $O(n \log n)$ instead of $O(nm)$ by zero-padding to length $n + m - 1$ and using FFTs.

Failure Mode

The theorem applies to circular convolution. For linear convolution (which is what CNNs actually compute), you must zero-pad the inputs to avoid wraparound artifacts. Also, for small filter sizes $m \ll n$ , direct convolution $O(nm)$ may beat FFT-based convolution $O(n \log n)$ due to FFT overhead.

report a correction →

Connections to ML

CNNs: Convolution layers apply filters to input feature maps. For large spatial dimensions, FFT-based convolution is faster than direct convolution. Libraries like cuDNN choose between direct and FFT implementations based on filter and input size.

Fourier features: The random Fourier features method (Rahimi and Recht, 2007) approximates a shift-invariant kernel $k(x-y)$ by sampling from the Fourier transform of $k$ . This relies on Bochner's theorem: a continuous shift-invariant positive definite kernel is the Fourier transform of a non-negative measure.

Efficient attention: Several methods (FNet, for example) replace the self-attention mechanism with Fourier transforms along the sequence dimension, achieving $O(n \log n)$ complexity instead of $O(n^2)$ for sequence length $n$ .

Spectral methods: Graph neural networks and spectral clustering use eigendecompositions and Fourier transforms on graphs. The graph Fourier transform generalizes the classical DFT to non-Euclidean domains.

Common Confusions

Watch Out

FFT computes the DFT, it is not a different transform

The FFT is an algorithm for computing the DFT. The DFT is the mathematical transform. The FFT and naive DFT produce exactly the same output. The FFT just does it faster.

Watch Out

Circular vs linear convolution

The convolution theorem gives circular convolution, not linear. In CNNs and signal processing, we typically want linear convolution. You must zero-pad to at least length $n + m - 1$ before applying the FFT to get the correct linear convolution result.

Watch Out

FFT is not always faster for small filters

For a 3x3 convolution filter on a 224x224 image (common in CNNs), direct convolution is faster than FFT. The FFT advantage kicks in for larger filters or when computing many convolutions simultaneously.

Summary

DFT: transforms time domain to frequency domain. Naive cost: $O(n^2)$
FFT (Cooley-Tukey): computes DFT in $O(n \log n)$ by divide and conquer on even/odd indices
Convolution theorem: convolution becomes pointwise multiply in frequency domain
For ML: efficient convolution, random Fourier features, efficient attention, spectral methods
Practical consideration: FFT beats direct convolution only when filter or signal size is large enough

Exercises

ExerciseCore

Problem

Compute the DFT of the sequence $x = (1, 0, 1, 0)$ by hand. Verify that $X_k = \sum_{j=0}^{3} x_j e^{-2\pi i jk/4}$ for $k = 0, 1, 2, 3$ .

ExerciseAdvanced

Problem

You need to convolve a signal of length $n = 2^{20} \approx 10^6$ with a filter of length $m = 2^{10} \approx 10^3$ . Compare the operation counts of direct convolution vs FFT-based convolution.

References

Canonical:

Cooley & Tukey, "An Algorithm for the Machine Calculation of Complex Fourier Series" (1965)
Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS), Chapter 30

Current:

Oppenheim & Willsky, Signals and Systems, Chapter 8
Rahimi & Recht, "Random Features for Large-Scale Kernel Machines" (2007)
Lee-Thorp et al., "FNet: Mixing Tokens with Fourier Transforms" (2021), arXiv:2105.03824
Bluestein, "A Linear Filtering Approach to the Computation of the Discrete Fourier Transform" (1968)

Next Topics

Efficient attention variants: some use FFT to avoid quadratic attention cost
Convolutional neural networks: where convolution meets deep learning

Last reviewed: April 14, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

2

Exponential Function Propertieslayer 0A · tier 1
Complex Numbers for Fourierlayer 0A · tier 2

Derived topics

4

PDE Fundamentals for Machine Learninglayer 1 · tier 2
Convolutional Neural Networkslayer 3 · tier 2
Fourier Neural Operatorlayer 3 · tier 2
Attention Variants and Efficiencylayer 4 · tier 2

Graph-backed continuations

Attention Variants and Efficiency Convolutional Neural Networks Fourier Neural Operator PDE Fundamentals for Machine Learning