AI Safety
Ethics and Fairness in ML
Fairness definitions (demographic parity, equalized odds, calibration), the impossibility theorem showing they cannot all hold simultaneously, bias sources, and mitigation strategies at each stage of the pipeline.
Why This Matters
ML systems make consequential decisions: who gets a loan, who gets bail, who gets hired, what medical treatment is recommended. These decisions often rely on classifiers whose outputs must be carefully evaluated using model evaluation techniques. If these systems discriminate on the basis of protected attributes (race, gender, age), they cause real harm at scale.
Fairness in ML is both a technical and a social problem. The technical challenge is precise: different mathematical definitions of "fairness" are provably incompatible. You cannot satisfy all of them simultaneously except in trivial cases. Choosing which definition to satisfy is a value judgment, not a mathematical one.
Fairness Definitions
Let be the model prediction, be the true outcome, and be a protected attribute (e.g., for two groups).
Demographic Parity
A classifier satisfies demographic parity if the prediction is independent of the protected attribute:
Both groups receive positive predictions at equal rates.
Equalized Odds
A classifier satisfies equalized odds if the prediction is independent of the protected attribute conditional on the true outcome:
Both groups have equal true positive rates and equal false positive rates.
Calibration
A classifier satisfies calibration (or predictive parity) if the predicted probability means the same thing for both groups:
When the model says "70% chance," it should mean 70% for both groups.
The Impossibility Theorem
Impossibility of Simultaneous Fairness
Statement
If the base rates differ between groups () and the classifier is imperfect (makes errors), then it is impossible to simultaneously satisfy demographic parity, equalized odds, and calibration.
More precisely, Chouldechova (2017) showed: for an imperfect binary classifier, if , then the classifier cannot simultaneously have equal false positive rates, equal false negative rates, and equal positive predictive values across both groups.
Intuition
If group A has a 30% base rate and group B has a 10% base rate, a calibrated classifier must predict higher scores for group A on average (to be accurate). But demographic parity requires equal prediction rates. These two requirements directly conflict when base rates differ.
Proof Sketch
Write out the relationship between PPV (positive predictive value), FPR (false positive rate), FNR (false negative rate), prevalence , and positive prediction rate :
If (different base rates) and we require and (equalized odds), then substituting different values gives (calibration fails). Similarly for other combinations.
Why It Matters
This theorem means there is no "fair classifier" in the absolute sense. Every deployed system must choose which fairness criterion to prioritize, and this choice has ethical and political implications that mathematics alone cannot resolve. Anyone claiming their system is "fair" without specifying which definition of fairness is either confused or misleading.
Failure Mode
The impossibility vanishes in two degenerate cases: (1) the base rates are equal (), in which case all definitions can be simultaneously satisfied, or (2) the classifier is perfect (zero errors), in which case all definitions are trivially satisfied. Neither case holds in practice.
Sources of Bias
Historical bias. The training data reflects past discriminatory decisions. If historical hiring data shows men were hired more often, a model trained on this data will perpetuate the pattern, even if the underlying qualifications are equal.
Measurement bias. The proxy variable does not measure the concept of interest equally across groups. Using "number of arrests" as a proxy for "criminality" introduces bias because arrest rates differ by race even after controlling for behavior.
Label bias. The labels themselves are biased. In medical diagnosis, certain conditions are systematically underdiagnosed in specific populations, making the training labels less accurate for those groups.
Aggregation bias. A single model is fit to a population where subgroups have different relationships between features and outcomes. A model predicting diabetes risk that does not account for different risk profiles by ethnicity will be less accurate for minority groups.
Mitigation Strategies
Preprocessing: Fix the Data
Rebalancing. Resample or reweight training data so that both groups have equal representation and equal base rates. This directly targets demographic parity but may reduce overall accuracy.
Fair representation learning. Learn a data representation that retains predictive information about but removes information about . Formally, minimize while maximizing , where is mutual information (from information theory). This is a constrained optimization problem.
In-Processing: Constrained Optimization
Add fairness constraints directly to the training objective:
Equalized Odds via Post-Processing
Statement
Given a score function and group membership , the equalized-odds-optimal classifier can be obtained by choosing group-specific thresholds and randomization parameters. The optimal thresholds solve:
This is a linear program over the ROC curves of each group.
Intuition
Each group has its own ROC curve (TPR vs FPR as the threshold varies). Equalized odds requires both groups to operate at the same (TPR, FPR) point. The optimal choice is the point on the intersection of feasible regions that minimizes overall error. Randomization may be needed to achieve exact equality.
Proof Sketch
The ROC curve for each group defines a set of achievable (FPR, TPR) pairs. By randomizing between two thresholds, any point on the line segment between two achievable points is also achievable. The constraint requires both groups to be at the same (FPR, TPR) point. This is a linear feasibility problem. Minimizing error subject to this constraint is a linear program.
Why It Matters
This shows that equalized odds can always be achieved by post-processing, without retraining the model. The cost is reduced accuracy (operating at a suboptimal threshold for at least one group). This quantifies the accuracy cost of fairness.
Failure Mode
Post-processing requires knowing group membership at prediction time, which may not be available or may be legally prohibited. The accuracy cost can be substantial when group score distributions are very different. Randomized classifiers may be unacceptable in high-stakes decisions.
Post-Processing: Adjust Thresholds
After training a score function , set different classification thresholds for each group to satisfy the desired fairness criterion. For demographic parity: choose such that .
Common Confusions
Removing the protected attribute does not remove bias
Dropping the column from the feature set does not make the model fair. Other features (zip code, name, occupation) can be correlated with and serve as proxies. This is called "redundant encoding." A model can effectively reconstruct from correlated features.
Fairness is not accuracy
A model can be highly accurate overall while being systematically wrong for a minority group. This is a manifestation of the bias-variance tradeoff at the subgroup level. If group A is 90% of the data, a model optimized for overall accuracy will tolerate poor performance on group B. Accuracy is a population-level metric that can hide group-level disparities.
Equal base rates do not mean equal treatment is fair
Even if , the model may have different error rates for each group because features have different predictive power across groups. Equalized odds is a stronger requirement than equal base rates.
Canonical Examples
COMPAS recidivism prediction
The COMPAS system assigns risk scores for criminal recidivism. ProPublica (2016) showed it had higher false positive rates for Black defendants than white defendants (equalized odds violated). Northpointe (the vendor) responded that the scores were calibrated: a score of 7 meant roughly the same recidivism probability for both groups. The impossibility theorem explains why both observations can be true simultaneously. With different base rates, calibration and equal error rates cannot both hold.
Summary
- Three main fairness definitions: demographic parity, equalized odds, calibration
- The impossibility theorem: with different base rates, you cannot satisfy all three
- Bias enters through data (historical, measurement, label, aggregation)
- Mitigation at three stages: preprocessing (fix data), in-processing (constrained training), post-processing (adjust thresholds)
- Removing the protected attribute does not ensure fairness due to proxy features
- Choosing which fairness criterion to optimize is a value judgment, not a technical one
Exercises
Problem
Group A has base rate and Group B has base rate . A classifier achieves demographic parity by predicting for 30% of each group. Is this classifier calibrated? Why or why not?
Problem
You have a trained risk score and need to produce a binary classifier satisfying equalized odds. Group 0 has ROC curve with AUC 0.90 and Group 1 has ROC curve with AUC 0.75. Explain why achieving equalized odds will be more costly (in terms of overall accuracy) than if both groups had AUC 0.90.
References
Canonical:
- Chouldechova, "Fair Prediction with Disparate Impact" (2017), Sections 1-3
- Hardt, Price, Srebro, "Equality of Opportunity in Supervised Learning" (2016)
Current:
- Barocas, Hardt, Narayanan, Fairness and Machine Learning (2023), Chapters 1-4
- Mehrabi et al., "A Survey on Bias and Fairness in Machine Learning" (2021)
- Corbett-Davies & Goel, "The Measure and Mismeasure of Fairness" (2018)
Next Topics
- Fairness intersects with all deployment decisions in ML systems
Last reviewed: April 2026