Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

AI Safety

Out-of-Distribution Detection

Methods for detecting when test inputs differ from training data, where naive softmax confidence fails and principled alternatives based on energy, Mahalanobis distance, and typicality succeed.

AdvancedTier 2Current~50 min
0

Why This Matters

Deployed models encounter inputs they were not trained on. A medical imaging classifier trained on X-rays will still produce a confident prediction when given a photo of a cat. A fraud detection model trained on 2020 data will silently fail on novel fraud patterns in 2024. OOD detection is the problem of recognizing when a model's input falls outside its training distribution, so you can abstain or escalate rather than trust a meaningless prediction.

The Core Problem

Let PinP_{\text{in}} be the training distribution and PoutP_{\text{out}} be any distribution not seen during training. Given a new input xx, we want a scoring function S(x)S(x) such that S(x)S(x) is high for xPinx \sim P_{\text{in}} and low for xPoutx \sim P_{\text{out}}.

Definition

Out-of-Distribution Input

An input xx is out-of-distribution with respect to a model trained on data from PinP_{\text{in}} if xx is drawn from some PoutP_{\text{out}} where PoutPinP_{\text{out}} \neq P_{\text{in}}. The boundary between "in" and "out" is task-dependent and requires a decision threshold on the scoring function.

Why Softmax Confidence Fails

The most naive OOD detector uses the maximum softmax probability: SMSP(x)=maxkp(y=kx)S_{\text{MSP}}(x) = \max_k p(y=k \mid x). This fails badly.

Proposition

Softmax Overconfidence on OOD Inputs

Statement

For a neural network f:RdRKf: \mathbb{R}^d \to \mathbb{R}^K with softmax output, there exist inputs xx far from the training distribution such that maxksoftmax(f(x))k\max_k \text{softmax}(f(x))_k is arbitrarily close to 1.

Intuition

Softmax normalizes logits to sum to 1. If one logit is much larger than the others, softmax assigns near-1 probability to that class regardless of whether the input is meaningful. Deep networks produce large logit norms for high-norm inputs, and OOD inputs often have unusual norms.

Proof Sketch

For any ϵ>0\epsilon > 0, consider inputs xx where f(x)\|f(x)\| \to \infty with one logit dominating. Then softmax(f(x))k1\text{softmax}(f(x))_k \to 1 as the dominant logit grows. ReLU networks produce unbounded outputs on unbounded inputs, so such xx always exist outside the training support.

Why It Matters

This means you cannot trust softmax probabilities as confidence scores for deployment safety. A model saying "95% cat" on a chest X-ray is not useful.

Failure Mode

Any OOD detection method that relies solely on softmax confidence will miss OOD inputs that happen to produce large logits in one class. This includes adversarial OOD examples constructed to maximize softmax confidence.

Detection Methods

ODIN: Temperature Scaling + Input Perturbation

ODIN (Liang et al., 2018) improves MSP with two tricks. First, divide logits by temperature T>1T > 1 before softmax to spread out the probability mass. Second, add a small perturbation to the input in the direction that increases the maximum softmax score:

x~=xϵsign(xmaxksoftmax(f(x)/T)k)\tilde{x} = x - \epsilon \cdot \text{sign}(\nabla_x \max_k \text{softmax}(f(x)/T)_k)

The score is then SODIN(x~)=maxksoftmax(f(x~)/T)kS_{\text{ODIN}}(\tilde{x}) = \max_k \text{softmax}(f(\tilde{x})/T)_k.

The perturbation amplifies the gap between in-distribution and OOD inputs because in-distribution inputs respond more coherently to gradient-based perturbation.

Energy-Based Detection

Definition

Energy Score

Given logits fk(x)f_k(x) for classes k=1,,Kk = 1, \ldots, K, the energy score is:

E(x)=logk=1Kefk(x)E(x) = -\log \sum_{k=1}^{K} e^{f_k(x)}

Lower energy (more negative) indicates in-distribution. This is the negative log of the partition function of the Gibbs distribution induced by the logits.

Proposition

Energy Score Separates In- and Out-Distribution

Statement

Under a Gibbs interpretation of the softmax classifier, the expected energy ExPin[E(x)]\mathbb{E}_{x \sim P_{\text{in}}}[E(x)] is lower (more negative) than ExPout[E(x)]\mathbb{E}_{x \sim P_{\text{out}}}[E(x)] when in-distribution inputs produce larger total logit magnitude than OOD inputs.

Intuition

Energy aggregates all logits via LogSumExp rather than taking only the max. In-distribution inputs activate learned features strongly, producing large logits across relevant classes. OOD inputs produce smaller or more uniform logits, yielding higher energy.

Proof Sketch

The energy E(x)=LSE(f(x))E(x) = -\text{LSE}(f(x)) is a monotone decreasing function of the total logit scale. Cross-entropy training pushes in-distribution logits to be large and well-separated. OOD inputs, lacking the trained features, produce smaller logit norms on average.

Why It Matters

Energy scoring is a drop-in replacement for MSP that requires no retraining, no hyperparameters (unlike ODIN), and consistently outperforms MSP across benchmarks.

Failure Mode

Fails when OOD inputs happen to strongly activate learned features. For example, a model trained on CIFAR-10 may assign low energy to SVHN digits because digit-like features are present in both distributions.

Mahalanobis Distance in Feature Space

Fit a class-conditional Gaussian N(μk,Σ)\mathcal{N}(\mu_k, \Sigma) to the penultimate-layer features of training data (tied covariance across classes). The OOD score for input xx with feature z=ϕ(x)z = \phi(x) is:

SMaha(x)=mink(zμk)TΣ1(zμk)S_{\text{Maha}}(x) = -\min_k (z - \mu_k)^T \Sigma^{-1} (z - \mu_k)

More negative values indicate OOD. This works because in-distribution features cluster near class means while OOD features fall in low-density regions of feature space.

Typicality Test

Rather than asking "is this input likely?", ask "is this input typical?" A high-dimensional Gaussian concentrates on a thin shell, not at the mode. An input with very high or very low likelihood under PinP_{\text{in}} is atypical and likely OOD.

The typicality score compares the log-likelihood of xx to the expected log-likelihood under PinP_{\text{in}}:

Styp(x)=logp(x)ExPin[logp(x)]S_{\text{typ}}(x) = -|\log p(x) - \mathbb{E}_{x' \sim P_{\text{in}}}[\log p(x')]|

This catches a failure mode of pure likelihood: generative models can assign higher likelihood to OOD data than in-distribution data (e.g., a CIFAR-10 model assigns higher likelihood to SVHN).

Common Confusions

Watch Out

High likelihood does not mean in-distribution

Deep generative models (VAEs, flows) can assign higher likelihood to OOD data than to training data. This happens because likelihood conflates density with the volume of the typical set. In high dimensions, the typical set occupies a thin shell, and OOD inputs can fall in high-density but atypical regions.

Watch Out

OOD detection is not anomaly detection

Anomaly detection finds unusual points within the training distribution. OOD detection finds points outside the training distribution entirely. A rare but valid medical image is an anomaly; a photo of food is OOD. The methods and assumptions differ.

Watch Out

No free lunch for OOD detection

Every OOD detector makes assumptions about what OOD data looks like. A detector calibrated for far-OOD (random noise) may fail on near-OOD (a closely related but different dataset). You must evaluate against the specific OOD scenarios relevant to your deployment.

Key Takeaways

  • Softmax confidence is not a reliable OOD detector; overconfidence on OOD inputs is the norm, not the exception
  • Energy scoring uses all logits via LogSumExp and consistently beats MSP with zero additional cost
  • ODIN adds temperature scaling and input perturbation for better separation
  • Mahalanobis distance exploits the geometry of learned feature space
  • Typicality tests address the failure of raw likelihood in high dimensions
  • No single method works for all OOD types; evaluate on your specific deployment scenario

Exercises

ExerciseCore

Problem

A softmax classifier outputs probabilities [0.97,0.02,0.01][0.97, 0.02, 0.01] on an input. Can you conclude the input is in-distribution? Explain why or why not, and describe what additional information you would need.

ExerciseAdvanced

Problem

Given class-conditional means μ1,,μK\mu_1, \ldots, \mu_K and shared covariance Σ\Sigma in a dd-dimensional feature space, derive the computational cost of computing the Mahalanobis OOD score for a single input. How does this scale with KK and dd?

References

Canonical:

  • Hendrycks & Gimpel, "A Baseline for Detecting Misclassified and OOD Examples" (2017)
  • Liang et al., "Enhancing The Reliability of OOD Image Detection" (ODIN, 2018)

Current:

  • Liu et al., "Energy-based OOD Detection" (NeurIPS 2020)
  • Lee et al., "A Simple Unified Framework for Detecting OOD Samples" (Mahalanobis, 2018)
  • Nalisnick et al., "Do Deep Generative Models Know What They Don't Know?" (2019), Section 3

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics