Signal Detection Theory

Sneiderman, Robby

Applied Math

Signal Detection Theory

The mathematical framework for binary decisions under noise. ROC curves, d-prime, likelihood ratios, the Neyman-Pearson lemma connection, and why SDT is the foundation of both psychophysics and ML classification evaluation.

CoreTier 2StableReference~40 min

Prerequisites

Common Probability Distributions Hypothesis Testing for ML RNNS for Signal Sequences

Start 8-question practice · 5 available 3-question pulse check Prereq Map

Learning position

Read this page in the graph.

applied-math | layer 2 | tier 2. This page has 3 direct prerequisites and 2 published dependents.

Open Atlas Prerequisites Leads to

What next

Confusion Matrices and Classification Metrics

This is the first curated or graph-derived continuation from the current page.

Evidence badge

Claim status

This page has no public Lean mapping yet. Use the evidence page to inspect how claim status labels work.

Show the backing system

AtlasOpen the full prerequisite graph and run grounding traces.EvidenceInspect source support, claim labels, and public trust status.LeanReview the checked declaration list, scopes, and axiom profile.

Why This Matters

Every binary classifier faces the same problem: given a noisy observation, decide whether it came from the "signal" class or the "noise" class. Signal detection theory (SDT) provides the complete mathematical framework for this decision. ROC curves, AUC, sensitivity ( $d'$ ), and the likelihood ratio test all originate here. SDT was developed in the 1950s as part of detection theory for radar operators deciding whether a blip on the screen was an enemy aircraft or noise. The same mathematics now governs medical diagnosis, spam filtering, and every ML classification metric in confusion matrices. Understanding SDT clarifies why the ROC curve works, what AUC actually measures, and when precision-recall is preferable.

The Basic Model

An observer receives a scalar observation $x$ drawn from one of two distributions:

Noise alone ( $H_0$ , signal absent): $x \sim f_0(x)$
Signal plus noise ( $H_1$ , signal present): $x \sim f_1(x)$

The observer sets a criterion $c$ and responds "signal present" if and only if $x > c$ , and "signal absent" otherwise.

Definition

The Four Outcomes

Given a binary decision and ground truth, there are four possible outcomes:

	Signal present ( $H_1$ )	Signal absent ( $H_0$ )
Respond "yes"	Hit (true positive)	False alarm (false positive)
Respond "no"	Miss (false negative)	Correct rejection (true negative)

The hit rate is $P(\text{respond yes} \mid H_1)$ . The false alarm rate is $P(\text{respond yes} \mid H_0)$ . These are the fundamental operating characteristics.

Sensitivity: $d'$ (d-prime)

Definition

d-prime $d^{'}$

When both distributions are Gaussian with equal variance $\sigma^2$ :

$f_0(x) = \mathcal{N}(\mu_0, \sigma^2), \quad f_1(x) = \mathcal{N}(\mu_1, \sigma^2)$

the sensitivity $d'$ is the standardized distance between the means:

$d' = \frac{\mu_1 - \mu_0}{\sigma}$

$d' = 0$ means the signal and noise distributions are identical (chance performance). Larger $d'$ means the signal is more discriminable from noise. $d'$ is independent of the criterion $c$ , measuring the observer's ability to discriminate regardless of bias.

Definition

Criterion and Bias $c$

The criterion $c$ is the threshold on the observation axis. The observer responds "signal" when $x > c$ . The criterion determines the tradeoff between hits and false alarms:

Liberal criterion (low $c$ ): high hit rate, high false alarm rate
Conservative criterion (high $c$ ): low hit rate, low false alarm rate

The bias $\beta$ is defined as the likelihood ratio at the criterion:

$\beta = \frac{f_1(c)}{f_0(c)}$

An unbiased observer sets $\beta = 1$ (criterion at the intersection of the two distributions). $\beta > 1$ is conservative, $\beta < 1$ is liberal.

The Likelihood Ratio

Definition

Likelihood Ratio $Λ (x)$

The likelihood ratio for observation $x$ is:

$\Lambda(x) = \frac{f_1(x)}{f_0(x)}$

This is the ratio of the probability of observing $x$ under signal-present versus signal-absent. The likelihood ratio is a sufficient statistic for the binary decision: all information relevant to the decision is captured by $\Lambda(x)$ .

For the equal-variance Gaussian case:

$\Lambda(x) = \frac{f_1(x)}{f_0(x)} = \exp\left(\frac{(\mu_1 - \mu_0)}{\sigma^2}\left(x - \frac{\mu_1 + \mu_0}{2}\right)\right)$

Since $\Lambda(x)$ is a monotone function of $x$ in this case, thresholding $x$ is equivalent to thresholding $\Lambda(x)$ . In general (non-Gaussian, unequal variance), the optimal decision rule thresholds $\Lambda(x)$ , not $x$ directly.

The Neyman-Pearson Lemma

Lemma

Neyman-Pearson Lemma

Statement

Among all decision rules with false alarm rate at most $\alpha$ , the rule that maximizes the hit rate (power) is the likelihood ratio test: respond "signal" if and only if

$\Lambda(x) = \frac{f_1(x)}{f_0(x)} > \eta$

where $\eta$ is chosen so that $P(\Lambda(x) > \eta \mid H_0) = \alpha$ .

No other test with the same false alarm rate can achieve a higher hit rate.

Intuition

The likelihood ratio $\Lambda(x)$ ranks observations by how much more likely they are under $H_1$ than under $H_0$ . If you can only afford a false alarm rate of $\alpha$ , you should spend your "budget" on the observations most indicative of $H_1$ . The likelihood ratio test does exactly this: it rejects $H_0$ for the observations with the strongest evidence for $H_1$ .

Proof Sketch

Let $\phi^*$ be the likelihood ratio test with threshold $\eta$ achieving false alarm rate $\alpha$ , and let $\phi$ be any other test with false alarm rate at most $\alpha$ . Consider the difference in power:

$\int (\phi^*(x) - \phi(x)) f_1(x) \, dx$

On the region where $\phi^* = 1$ (i.e., $\Lambda(x) > \eta$ ), we have $f_1(x) > \eta f_0(x)$ , so $(\phi^* - \phi) f_1 \geq \eta (\phi^* - \phi) f_0$ . On the region where $\phi^* = 0$ , we have $f_1(x) \leq \eta f_0(x)$ , so again $(\phi^* - \phi) f_1 \geq \eta (\phi^* - \phi) f_0$ . Integrating:

$\int (\phi^* - \phi) f_1 \, dx \geq \eta \int (\phi^* - \phi) f_0 \, dx \geq 0$

The last inequality holds because $\phi$ has false alarm rate at most $\alpha = \int \phi^* f_0 \, dx$ , so $\int (\phi^* - \phi) f_0 \, dx \geq 0$ . Therefore the power of $\phi^*$ is at least the power of $\phi$ .

Why It Matters

The Neyman-Pearson lemma is the theoretical foundation of ROC analysis. Each point on the ROC curve corresponds to a specific threshold $\eta$ on the likelihood ratio. The ROC curve traces out the optimal tradeoff between hit rate and false alarm rate. Any decision rule that does not use the likelihood ratio is suboptimal: it achieves a point below the ROC curve. In ML, a classifier's predicted probability (when well-calibrated) approximates the likelihood ratio, and thresholding it traces the ROC curve.

Failure Mode

The lemma requires known, fully specified distributions $f_0$ and $f_1$ (simple hypotheses). When the distributions have unknown parameters (composite hypotheses), the likelihood ratio test is no longer uniformly most powerful. In ML, the true class-conditional distributions are unknown, so ROC curves are estimated empirically. The lemma guarantees optimality in the idealized setting; in practice, the quality of the ROC depends on how well the classifier's scores approximate the true likelihood ratio.

report a correction →

ROC Curves

Definition

Receiver Operating Characteristic (ROC) Curve

The ROC curve plots the hit rate (true positive rate) against the false alarm rate (false positive rate) as the criterion $c$ varies from $+\infty$ to $-\infty$ :

$x$ -axis: false alarm rate $= P(\text{respond yes} \mid H_0) = \int_c^{\infty} f_0(x) \, dx$
$y$ -axis: hit rate $= P(\text{respond yes} \mid H_1) = \int_c^{\infty} f_1(x) \, dx$

A perfect discriminator has an ROC curve passing through $(0, 1)$ (zero false alarms, 100% hit rate). A random guesser lies on the diagonal from $(0, 0)$ to $(1, 1)$ .

For the equal-variance Gaussian model, the ROC curve has a closed-form parameterization. Let $\Phi$ denote the standard normal CDF and $\Phi^{-1}$ its inverse. If the false alarm rate is $\alpha = 1 - \Phi((c - \mu_0)/\sigma)$ , then:

$\text{hit rate} = 1 - \Phi\left(\Phi^{-1}(1 - \alpha) - d'\right)$

The ROC curve bows toward the upper-left corner. The larger $d'$ , the more the curve bows, indicating better discrimination.

Definition

Area Under the ROC Curve (AUC)

The AUC is the integral of the ROC curve over $[0, 1]$ :

$\text{AUC} = \int_0^1 \text{TPR}(\text{FPR}) \, d(\text{FPR})$

AUC has a probabilistic interpretation: it equals the probability that a randomly chosen signal observation scores higher than a randomly chosen noise observation:

$\text{AUC} = P(X_1 > X_0) \quad \text{where } X_1 \sim f_1, \; X_0 \sim f_0$

For the equal-variance Gaussian model: $\text{AUC} = \Phi(d' / \sqrt{2})$ .

Proposition

AUC as Concordance Probability

Statement

Let $X_1 \sim f_1$ and $X_0 \sim f_0$ be independent draws from the signal and noise distributions respectively. Then:

$\text{AUC} = P(X_1 > X_0)$

This equals the Wilcoxon-Mann-Whitney $U$ statistic normalized to $[0, 1]$ .

Intuition

AUC measures how well the scoring function separates the two classes. If every signal observation scores higher than every noise observation, $\text{AUC} = 1$ . If scores are completely random with respect to class, $\text{AUC} = 0.5$ . AUC is the probability that the classifier correctly ranks a random signal-noise pair.

Proof Sketch

Write $\text{AUC} = \int_0^1 \text{TPR}(\alpha) \, d\alpha$ . By change of variables with $\alpha = P(X_0 > c)$ :

$\text{AUC} = \int_{-\infty}^{\infty} P(X_1 > c) f_0(c) \, dc = P(X_1 > X_0)$

The equivalence to the Mann-Whitney $U$ statistic follows from the fact that $U/(n_0 \cdot n_1)$ is the empirical estimate of $P(X_1 > X_0)$ computed over all pairs.

Why It Matters

This interpretation explains why AUC is threshold-independent: it averages over all possible operating points. In ML, AUC is the standard metric when the cost of false positives versus false negatives is unknown or when different deployment scenarios require different thresholds. AUC of 0.5 means the model is no better than random; AUC of 1.0 means perfect separation.

Failure Mode

AUC weights all thresholds equally, including regions of very high false alarm rate that are irrelevant in practice. If you care only about performance at low false positive rates (common in medical screening, fraud detection), the partial AUC over the relevant FPR range is more informative. Precision-recall curves are preferred when the negative class vastly outnumbers the positive class, because AUC can be misleadingly high when most predictions are correct simply by predicting the majority class.

report a correction →

Origins: Psychophysics

SDT was formalized by Green and Swets (1966) for psychophysics experiments. A classic example: an observer listens to intervals of noise and must decide whether a faint tone was present. The observer's internal representation is a noisy scalar, and $d'$ measures perceptual sensitivity independent of the observer's willingness to say "yes." Before SDT, psychophysics conflated sensitivity with bias. Two observers with the same perceptual ability but different response biases (one cautious, one trigger-happy) would appear to have different detection thresholds. SDT separated these two factors cleanly.

Connection to ML Classification

Modern ML evaluation is a direct descendant of SDT:

A classifier's predicted probability or score plays the role of the observation $x$
The ROC curve is constructed by sweeping the classification threshold
AUC measures overall discrimination ability, analogous to a nonparametric measure of $d'$
Precision and recall are SDT concepts restricted to the positive predictions
The Neyman-Pearson lemma explains why the likelihood ratio (or a monotone transform of it, like calibrated probabilities) is the optimal scoring function

The key insight: a well-calibrated classifier with $P(\text{class} = 1 \mid x)$ as its score function is computing the posterior, which is a monotone function of the likelihood ratio when the prior is fixed. Thresholding this posterior at different values traces the ROC curve.

Common Confusions

Watch Out

d-prime requires equal-variance Gaussian assumption

$d'$ is defined as $(\mu_1 - \mu_0)/\sigma$ only when both distributions are Gaussian with the same variance. If the variances differ, the ROC curve on normal-normal axes (zROC) is a straight line with slope $\sigma_0/\sigma_1 \neq 1$ , and $d'$ no longer fully characterizes performance. In ML, the equal-variance assumption rarely holds, so AUC (which is nonparametric) is preferred over $d'$ .

Watch Out

High AUC does not mean the classifier is useful at your operating point

AUC averages over all thresholds. A classifier with AUC = 0.95 might perform poorly at the specific false positive rate your application requires. Always examine the ROC curve (or precision-recall curve) at the operating point relevant to your deployment, not just the aggregate AUC.

Watch Out

ROC curves can be misleading with class imbalance

With extreme class imbalance (e.g., 1% positive, 99% negative), a classifier can achieve high AUC by ranking well without achieving useful precision. A model that assigns slightly higher scores to the rare positive class achieves good AUC, but when you threshold to get high recall, precision may be very low. Use precision-recall curves for imbalanced problems.

Exercises

ExerciseCore

Problem

Two Gaussian distributions have $\mu_0 = 0$ , $\mu_1 = 2$ , and $\sigma = 1$ . Compute $d'$ . If the criterion is set at $c = 1$ (midpoint), compute the hit rate and false alarm rate.

ExerciseCore

Problem

A binary classifier assigns scores to 5 positive and 5 negative examples. The positive scores are $\{0.9, 0.8, 0.6, 0.55, 0.4\}$ and the negative scores are $\{0.7, 0.5, 0.35, 0.2, 0.1\}$ . Compute the AUC using the concordance interpretation.

ExerciseAdvanced

Problem

Show that for the equal-variance Gaussian model, $\text{AUC} = \Phi(d'/\sqrt{2})$ . Start from the concordance definition $\text{AUC} = P(X_1 > X_0)$ where $X_1 \sim \mathcal{N}(\mu_1, \sigma^2)$ and $X_0 \sim \mathcal{N}(\mu_0, \sigma^2)$ .

ExerciseAdvanced

Problem

An observer in a psychophysics experiment has $d' = 1.5$ and sets a criterion corresponding to $\beta = 2$ (conservative). For the equal-variance Gaussian model with $\sigma = 1$ , find the criterion location $c$ , the hit rate, and the false alarm rate.

References

Canonical:

Green & Swets, Signal Detection Theory and Psychophysics (1966), Chapters 1-4
Macmillan & Creelman, Detection Theory: A User's Guide (2nd ed., 2004), Chapters 1-3

Current:

Fawcett, "An Introduction to ROC Analysis" (Pattern Recognition Letters, 2006)
Saito & Rehmsmeier, "The Precision-Recall Plot Is More Informative than the ROC Plot" (PLOS ONE, 2015)
Hand, "Measuring Classifier Performance: A Coherent Alternative to the Area Under the ROC Curve" (Machine Learning, 2009)
Wickens, Elementary Signal Detection Theory (2002), Chapters 2-5

Next Topics

Confusion matrices and classification metrics: the full taxonomy of classification evaluation metrics, directly built on SDT concepts

Last reviewed: April 15, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.

Full prerequisite chain All derived topics

Required prerequisites

3

Common Probability Distributionslayer 0A · tier 1
Hypothesis Testing for MLlayer 2 · tier 2
RNNs for Signal Sequenceslayer 4 · tier 3

Derived topics

2

Confusion Matrices and Classification Metricslayer 1 · tier 1
Anomaly Detection for Gravitational Waveslayer 4 · tier 3

Graph-backed continuations

Confusion Matrices and Classification Metrics Anomaly Detection for Gravitational Waves

Read this page in the graph.

Why This Matters

The Basic Model

Sensitivity: d′d'd′ (d-prime)

The Likelihood Ratio

The Neyman-Pearson Lemma

ROC Curves

Origins: Psychophysics

Connection to ML Classification

Common Confusions

Exercises

References

Next Topics

Required before and derived from this topic

Required prerequisites

Derived topics

Sensitivity: $d'$ (d-prime)