ML Methods
Autoencoders
Encoder-decoder architectures for unsupervised representation learning: undercomplete bottlenecks, sparse and denoising variants, and the connection between linear autoencoders and PCA.
Why This Matters
Autoencoders are the simplest framework for unsupervised representation learning: compress data through a bottleneck, then reconstruct it, and the bottleneck representation captures the most important structure. This idea is foundational. It leads directly to variational autoencoders, which connect to probabilistic generative modeling, and to modern representation learning methods used in self-supervised learning.
Mental Model
Think of an autoencoder as a compression algorithm that learns what to compress. The encoder maps high-dimensional input to a low-dimensional code. The decoder maps the code back to the original space. Training minimizes reconstruction error. Whatever structure the code must capture to reconstruct well. that is the learned representation.
Formal Setup
Autoencoder
An autoencoder consists of two functions:
- Encoder: mapping input to a latent code
- Decoder: mapping the code back to a reconstruction
Training minimizes the reconstruction loss:
The composition learns to approximate the identity function, but only through the information bottleneck of the code .
Undercomplete Autoencoders
Undercomplete Autoencoder
An autoencoder is undercomplete when the code dimension is smaller than the input dimension (i.e., ). The bottleneck forces the encoder to learn a compressed representation that retains the most important information for reconstruction.
If and the encoder/decoder have enough capacity, the autoencoder can learn the identity function. which captures nothing useful about the data structure. The bottleneck is what makes autoencoders nontrivial.
The Linear Autoencoder = PCA Connection
Linear Autoencoders Recover PCA
Statement
If the encoder and decoder are both linear (no nonlinearity), then the optimal -dimensional autoencoder (minimizing reconstruction error) recovers the subspace spanned by the top principal components of the data. Specifically, the optimal encoder projects onto the span of the top eigenvectors of the data covariance matrix .
Intuition
PCA finds the -dimensional subspace that preserves the most variance; equivalently, that minimizes reconstruction error under orthogonal projection. A linear autoencoder with a -dimensional bottleneck is solving exactly the same optimization problem, just without the orthogonality constraint. The optimal solution finds the same subspace (though the encoder/decoder matrices may not be orthogonal individually).
Proof Sketch
Let the encoder be with and decoder be with . The reconstruction loss is . The optimal is the rank- projection onto the top eigenvectors of the covariance matrix, by the Eckart-Young theorem (best rank- approximation in Frobenius norm comes from truncated SVD).
Why It Matters
This theorem reveals that nonlinearity is what makes deep autoencoders more powerful than PCA. Without nonlinear activations, you are just doing PCA with extra steps. Nonlinear autoencoders can learn curved, nonlinear manifolds that PCA (restricted to linear subspaces) cannot capture.
Failure Mode
The equivalence is exact only for linear activations and squared loss. With nonlinear activations, the autoencoder can capture nonlinear structure but also has more complex loss landscapes with local minima.
Sparse Autoencoders
Sparse Autoencoder
A sparse autoencoder uses an overcomplete code () but adds a sparsity penalty on the activations:
The penalty (or KL divergence from a target activation level) encourages most code units to be zero for any given input. Each input activates only a small subset of features, and each feature specializes in a particular pattern.
Sparse autoencoders are useful for feature discovery: the learned features are often interpretable (edges, textures, object parts for images). They are also used in mechanistic interpretability to decompose neural network activations into interpretable directions.
Denoising Autoencoders
Denoising Autoencoder
A denoising autoencoder corrupts each input to produce (e.g., by adding Gaussian noise or randomly masking features) and trains to reconstruct the clean original:
where with or is with random entries zeroed out.
The denoising objective prevents the autoencoder from learning the identity even without a bottleneck. To reconstruct clean data from noisy input, the model must learn the data manifold structure. It must know what "clean" looks like. This connects to score matching: the optimal denoising function is related to the gradient of the log data density.
Canonical Examples
Image reconstruction with convolutional autoencoder
For MNIST images (), a convolutional encoder reduces spatial dimensions via strided convolutions down to a dimensional code vector. The decoder uses transposed convolutions to upsample back to . Reconstruction error is typically MSE. The 32-dimensional code captures digit identity, stroke thickness, and orientation. The factors of variation that matter for reconstruction.
Linear autoencoder on 2D data
Consider 1000 points in lying near a 2D plane (with small noise). A linear autoencoder with learns to project onto this plane, exactly what PCA does. The reconstruction error equals the variance in the 8 discarded dimensions. A nonlinear autoencoder with could capture a curved manifold that PCA misses.
Common Confusions
Overcomplete autoencoders are not useless
If with no regularization, the autoencoder can learn the identity (trivial solution). But with sparsity, denoising, or other regularization, overcomplete autoencoders learn useful representations. The constraint is not just dimensionality. It is the information bottleneck in a broader sense.
Autoencoders are not generative models by default
A standard autoencoder learns to reconstruct training data, but the latent space has no particular structure. Sampling a random point in the latent space and decoding it typically produces garbage. Making autoencoders generative requires imposing structure on the latent space. This is exactly what variational autoencoders do.
Summary
- Autoencoders learn representations by minimizing reconstruction error through a bottleneck
- Undercomplete (): bottleneck forces compression
- Linear autoencoder with squared loss = PCA (same subspace, possibly different basis)
- Sparse autoencoders: overcomplete code with penalty encourages interpretable features
- Denoising autoencoders: corrupt input, reconstruct clean. learns the data manifold without needing a bottleneck
- Standard autoencoders are not generative models. The latent space lacks structure for sampling
Exercises
Problem
A linear autoencoder with encoder and decoder is trained on centered data. What is the rank of the reconstruction , and how does this relate to PCA?
Problem
Why does a denoising autoencoder learn useful representations even without a bottleneck (when )? What prevents it from learning the identity?
References
Canonical:
- Goodfellow, Bengio, Courville, Deep Learning (2016), Chapter 14
- Vincent et al., "Extracting and Composing Robust Features with Denoising Autoencoders" (2008)
Current:
-
Cunningham et al., "Sparse Autoencoders Find Highly Interpretable Features in Language Models" (2023)
-
Bishop, Pattern Recognition and Machine Learning (2006), Chapters 1-14
-
Murphy, Machine Learning: A Probabilistic Perspective (2012), Chapters 1-28
Next Topics
The natural next steps from autoencoders:
- Variational autoencoders: making the latent space generative
- Dimensionality reduction: PCA and nonlinear alternatives
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Feedforward Networks and BackpropagationLayer 2
- Differentiation in RnLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Matrix CalculusLayer 1
- The Jacobian MatrixLayer 0A
- The Hessian MatrixLayer 0A
- Activation FunctionsLayer 1
- Convex Optimization BasicsLayer 1
- Matrix Operations and PropertiesLayer 0A
- Vectors, Matrices, and Linear MapsLayer 0A
Builds on This
- Energy-Based ModelsLayer 3
- JEPA and Joint EmbeddingLayer 4
- Sparse Autoencoders for InterpretabilityLayer 4
- Variational AutoencodersLayer 3