Out-of-Distribution Detection

Sneiderman, Robby

AI Safety

Out-of-Distribution Detection

Methods for detecting when test inputs differ from training data, where naive softmax confidence fails and principled alternatives based on energy, Mahalanobis distance, and typicality succeed.

AdvancedTier 2CurrentSupporting~50 min

Prerequisites

Calibration and Uncertainty Anomaly Detection Gravitational Waves CNNS for Medical Imaging

Prereq Map

Learning position

Read this page in the graph.

ai-safety | layer 3 | tier 2. This page has 3 direct prerequisites and 0 published dependents.

Open Atlas Prerequisites Leads to

What next

Mechanistic Interpretability: Features, Circuits, and Causal Faithfulness

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Deployed models encounter inputs they were not trained on. A medical imaging classifier trained on X-rays will still produce a confident prediction when given a photo of a cat. A fraud detection model trained on 2020 data will silently fail on novel fraud patterns in 2024. OOD detection is the problem of recognizing when a model's input falls outside its training distribution, so you can abstain or escalate rather than trust a meaningless prediction.

The Core Problem

Let $P_{\text{in}}$ be the training distribution and $P_{\text{out}}$ be any distribution not seen during training. Given a new input $x$ , we want a scoring function $S(x)$ such that $S(x)$ is high for $x \sim P_{\text{in}}$ and low for $x \sim P_{\text{out}}$ .

Definition

Out-of-Distribution Input

An input $x$ is out-of-distribution with respect to a model trained on data from $P_{\text{in}}$ when $x$ is drawn from some $P_{\text{out}}$ where $P_{\text{out}} \neq P_{\text{in}}$ . The boundary between "in" and "out" is task-dependent and requires a decision threshold on the scoring function.

Why Softmax Confidence Fails

The most naive OOD detector uses the maximum softmax probability: $S_{\text{MSP}}(x) = \max_k p(y=k \mid x)$ . This fails badly.

Proposition

Softmax Overconfidence on OOD Inputs

Statement

For a neural network $f: \mathbb{R}^d \to \mathbb{R}^K$ with softmax output, there exist inputs $x$ far from the training distribution such that $\max_k \text{softmax}(f(x))_k$ is arbitrarily close to 1.

Intuition

Softmax normalizes logits to sum to 1. If one logit is much larger than the others, softmax assigns near-1 probability to that class regardless of whether the input is meaningful. Deep networks produce large logit norms for high-norm inputs, and OOD inputs often have unusual norms.

Proof Sketch

For any $\epsilon > 0$ , consider inputs $x$ where $\|f(x)\| \to \infty$ with one logit dominating. Then $\text{softmax}(f(x))_k \to 1$ as the dominant logit grows. ReLU networks produce unbounded outputs on unbounded inputs, so such $x$ always exist outside the training support.

Why It Matters

This means you cannot trust softmax probabilities as confidence scores for deployment safety. A model saying "95% cat" on a chest X-ray is not useful.

Failure Mode

Any OOD detection method that relies solely on softmax confidence will miss OOD inputs that happen to produce large logits in one class. This includes adversarial OOD examples constructed to maximize softmax confidence.

report a correction →

Detection Methods

ODIN: Temperature Scaling + Input Perturbation

ODIN (Liang et al., 2018) improves MSP with two tricks. First, divide logits by temperature $T > 1$ before softmax to spread out the probability mass. Second, perturb the input in the direction that increases the maximum softmax score. Let $\hat y = \arg\max_k \text{softmax}(f(x)/T)_k$ and $S_{\hat y}(x;T)$ the corresponding softmax probability. The Liang et al. (2018) update is:

\tilde{x} = x - \epsilon \cdot \text{sign}\!\big(-\nabla_x \log S_{\hat y}(x;T)\big) = x + \epsilon \cdot \text{sign}\!\big(\nabla_x \log S_{\hat y}(x;T)\big)

so $\tilde x$ moves along the ascent direction of the max-class log-probability. The score is then $S_{\text{ODIN}}(\tilde{x}) = \max_k \text{softmax}(f(\tilde{x})/T)_k$ .

The perturbation amplifies the gap between in-distribution and OOD inputs because in-distribution inputs respond more coherently to gradient-based perturbation.

Energy-Based Detection

Definition

Energy Score $E (x)$

Given logits $f_k(x)$ for classes $k = 1, \ldots, K$ , the energy score is:

$E(x) = -\log \sum_{k=1}^{K} e^{f_k(x)}$

Lower energy (more negative) indicates in-distribution. This is the negative log of the partition function of the Gibbs distribution induced by the logits.

Proposition

Energy Score Separates In- and Out-Distribution

Statement

Under a Gibbs interpretation of the softmax classifier, the expected energy $\mathbb{E}_{x \sim P_{\text{in}}}[E(x)]$ is lower (more negative) than $\mathbb{E}_{x \sim P_{\text{out}}}[E(x)]$ when in-distribution inputs produce larger total logit magnitude than OOD inputs.

Intuition

Energy aggregates all logits via LogSumExp rather than taking only the max. In-distribution inputs activate learned features strongly, producing large logits across relevant classes. OOD inputs produce smaller or more uniform logits, yielding higher energy.

Proof Sketch

The energy $E(x) = -\text{LSE}(f(x))$ is a monotone decreasing function of the total logit scale. Cross-entropy training pushes in-distribution logits to be large and well-separated. OOD inputs, lacking the trained features, produce smaller logit norms on average.

Why It Matters

Energy scoring is a drop-in replacement for MSP that requires no retraining, no hyperparameters (unlike ODIN), and consistently outperforms MSP across benchmarks.

Failure Mode

Fails when OOD inputs happen to strongly activate learned features. For example, a model trained on CIFAR-10 may assign low energy to SVHN digits because digit-like features are present in both distributions.

report a correction →

Mahalanobis Distance in Feature Space

Fit a class-conditional Gaussian $\mathcal{N}(\mu_k, \Sigma)$ to the penultimate-layer features of training data (tied covariance across classes). The OOD score for input $x$ with feature $z = \phi(x)$ is:

$S_{\text{Maha}}(x) = -\min_k (z - \mu_k)^T \Sigma^{-1} (z - \mu_k)$

More negative values indicate OOD. This works because in-distribution features cluster near class means while OOD features fall in low-density regions of feature space.

Typicality Test

Rather than asking "is this input likely?", ask "is this input typical?" A high-dimensional Gaussian concentrates on a thin shell, not at the mode. An input with very high or very low likelihood under $P_{\text{in}}$ is atypical and likely OOD.

The typicality score compares the log-likelihood of $x$ to the expected log-likelihood under $P_{\text{in}}$ :

$S_{\text{typ}}(x) = -|\log p(x) - \mathbb{E}_{x' \sim P_{\text{in}}}[\log p(x')]|$

This catches a failure mode of pure likelihood: generative models can assign higher likelihood to OOD data than in-distribution data (e.g., a CIFAR-10 model assigns higher likelihood to SVHN).

Common Confusions

Watch Out

High likelihood does not mean in-distribution

Deep generative models (VAEs, flows) can assign higher likelihood to OOD data than to training data. This happens because likelihood conflates density with the volume of the typical set. In high dimensions, the typical set occupies a thin shell, and OOD inputs can fall in high-density but atypical regions.

Watch Out

OOD detection is not anomaly detection

Anomaly detection finds unusual points within the training distribution. OOD detection finds points outside the training distribution entirely. A rare but valid medical image is an anomaly; a photo of food is OOD. The methods and assumptions differ.

Watch Out

No free lunch for OOD detection

Every OOD detector makes assumptions about what OOD data looks like. A detector calibrated for far-OOD (random noise) may fail on near-OOD (a closely related but different dataset). You must evaluate against the specific OOD scenarios relevant to your deployment.

Summary

Softmax confidence is not a reliable OOD detector; overconfidence on OOD inputs is the norm, not the exception
Energy scoring uses all logits via LogSumExp and consistently beats MSP with zero additional cost
ODIN adds temperature scaling and input perturbation for better separation
Mahalanobis distance exploits the geometry of learned feature space
Typicality tests address the failure of raw likelihood in high dimensions
No single method works for all OOD types; evaluate on your specific deployment scenario

Exercises

ExerciseCore

Problem

A softmax classifier outputs probabilities $[0.97, 0.02, 0.01]$ on an input. Can you conclude the input is in-distribution? Explain why or why not, and describe what additional information you would need.

ExerciseAdvanced

Problem

Given class-conditional means $\mu_1, \ldots, \mu_K$ and shared covariance $\Sigma$ in a $d$ -dimensional feature space, derive the computational cost of computing the Mahalanobis OOD score for a single input. How does this scale with $K$ and $d$ ?

References

Canonical:

Hendrycks & Gimpel, "A Baseline for Detecting Misclassified and OOD Examples" (ICLR 2017). Introduces MSP as the reference baseline.
Liang, Li, Srikant, "Enhancing The Reliability of OOD Image Detection in Neural Networks" (ODIN, ICLR 2018). Temperature scaling plus input preprocessing.
Hein, Andriushchenko, Bitterwolf, "Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem" (CVPR 2019). Proves the softmax-overconfidence result and motivates robust OOD-aware training.

Current:

Liu, Wang, Owens, Li, "Energy-based Out-of-Distribution Detection" (NeurIPS 2020)
Lee, Lee, Lee, Shin, "A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks" (Mahalanobis, NeurIPS 2018)
Nalisnick, Matsukawa, Teh, Gorur, Lakshminarayanan, "Do Deep Generative Models Know What They Don't Know?" (ICLR 2019), Section 3

Next Topics

Mechanistic interpretability: understanding what features a model uses
Hallucination theory: when models confidently produce wrong outputs

Last reviewed: April 17, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Calibration and Uncertainty Quantificationlayer 3 · tier 2
Anomaly Detection for Gravitational Waveslayer 4 · tier 3
CNNs for Medical Imaginglayer 4 · tier 3

Derived topics

0

No published topic currently declares this as a prerequisite.