Methodology
Feature Importance and Interpretability
Methods for attributing model predictions to input features: permutation importance, SHAP values, LIME, partial dependence, and why none of these imply causality.
Prerequisites
Why This Matters
You built a model that predicts well. Now someone asks: "which features matter?" This question arises in every applied ML project, from regulatory compliance (finance, healthcare) to debugging models to scientific discovery. The methods here answer variants of this question, but each defines "importance" differently. Confusing these definitions leads to incorrect conclusions about what drives model behavior and, worse, incorrect causal claims.
Permutation Importance
Permutation Importance
The permutation importance of feature is the decrease in model performance when the values of feature are randomly shuffled across the dataset, breaking the association between feature and the target:
where is with column permuted and Score is a performance metric (accuracy, , etc.).
Permutation importance has a clean interpretation: it measures how much the model relies on the marginal distribution of feature for its predictions. It works for any model and any metric. Compute it on the test set, not the training set, to avoid reflecting memorization.
SHAP Values
Shapley Value
For a prediction with features , the Shapley value of feature is:
where is the expected model output when only features in are observed and remaining features are marginalized out. The Shapley value attributes the prediction to each feature by averaging the marginal contribution of that feature across all possible orderings.
Shapley Value Uniqueness
Statement
The Shapley value is the unique attribution method satisfying:
- Efficiency:
- Symmetry: if features and contribute equally to all coalitions,
- Linearity: for combined games ,
- Null player: if feature never changes any coalition value,
No other attribution method satisfies all four axioms simultaneously.
Intuition
Shapley values decompose the total prediction (minus baseline) into additive contributions, one per feature, in the only way that is fair, consistent, and complete. "Fair" means symmetric features get equal credit; "complete" means all credit is allocated.
Proof Sketch
Existence: the formula defines values satisfying all axioms (verify each). Uniqueness: assume two solutions and both satisfy the axioms. By linearity, it suffices to prove uniqueness for unanimity games if , else 0. For unanimity games, efficiency and null player force for and otherwise. Since any game is a linear combination of unanimity games, linearity extends uniqueness to all games.
Why It Matters
SHAP (SHapley Additive exPlanations) uses Shapley values for ML model explanations. The uniqueness theorem means that if you accept the four axioms as reasonable fairness requirements, there is exactly one way to attribute predictions. This gives SHAP a theoretical grounding that permutation importance and LIME lack.
Failure Mode
Computing exact Shapley values requires coalition evaluations. For models with hundreds of features, this is intractable. Practical SHAP implementations use approximations: KernelSHAP (sampling-based) or TreeSHAP (exact for tree models, per prediction). These approximations can introduce errors, and KernelSHAP in particular may not converge with insufficient samples.
LIME
LIME Local Fidelity
Statement
LIME finds an interpretable model that approximates locally:
where is a kernel weighting proximity to , is a set of perturbed samples around , and is a complexity penalty. For linear , the coefficients serve as local feature importances.
Intuition
LIME asks: "near this specific input, which features does the model rely on?" It answers by fitting a simple linear model to the model's behavior in a local neighborhood. The linear coefficients tell you the local importance of each feature for this particular prediction.
Proof Sketch
The objective is weighted least squares with regularization. For linear , this has a closed-form solution: the ridge regression coefficients using the kernel-weighted design matrix of perturbed samples.
Why It Matters
LIME is model-agnostic and produces per-instance explanations. Unlike global methods (permutation importance), it captures local behavior: a feature might be important for one prediction and irrelevant for another.
Failure Mode
LIME depends heavily on the perturbation distribution and kernel width. For tabular data, perturbing features independently ignores feature correlations, producing unrealistic samples. For images, the superpixel segmentation determines what "features" LIME can identify. Different kernel widths produce different explanations for the same input. There is no principled way to choose these hyperparameters.
Partial Dependence Plots
A partial dependence plot (PDP) shows the marginal effect of a feature on the model output, averaging over all other features:
PDPs are simple and global but assume feature independence. If is correlated with other features, the PDP evaluates the model at unrealistic feature combinations (e.g., height = 180cm and age = 3).
Individual Conditional Expectation (ICE) plots show the same curve per instance rather than averaged, revealing heterogeneity in feature effects.
Built-in Importance
Some models provide importance measures directly:
- Tree impurity importance: total decrease in Gini impurity or entropy from splits on feature . Biased toward high-cardinality features.
- Coefficient magnitude: in linear models, (after feature standardization). Only meaningful when features have comparable scales.
- Attention weights: in transformers, often misinterpreted as importance. Attention is a computational mechanism, not an explanation of which inputs caused the output. Jain and Wallace (2019) showed attention frequently does not correlate with gradient-based importance.
The Causal Trap
Feature importance is not causal importance
All methods on this page answer: "what features does the model use?" None answer: "what features cause the outcome?" Permutation importance tells you the model relies on feature . It does not tell you that intervening on feature would change the outcome in the real world. A model predicting hospital mortality might rely on "has palliative care order" because that feature correlates with severity, not because palliative care causes death. Causal claims require causal methodology (randomized experiments, instrumental variables, do-calculus), not feature importance methods.
Global importance can hide local behavior
A feature with low global permutation importance might be the most important feature for a specific subgroup. Average importance masks heterogeneity. SHAP and LIME provide per-instance attributions that can reveal this, but aggregating them back to global summaries loses the same information.
Correlated features split importance unpredictably
If features and are highly correlated, permutation importance will underestimate both (shuffling one does not remove the information because the other is still intact). SHAP distributes importance between them according to the Shapley axioms, but the split may not match intuition. Neither method tells you which correlated feature is "truly important" because that is a causal question.
Key Takeaways
- Permutation importance: global, model-agnostic, measures performance degradation when a feature's association with the target is broken
- SHAP: per-instance, grounded in Shapley axioms (the unique fair attribution), but exponentially expensive to compute exactly
- LIME: per-instance, local linear approximation, but sensitive to perturbation distribution and kernel width
- PDPs: global marginal effect, assumes feature independence
- None of these methods imply causality. Feature importance != causal importance.
Exercises
Problem
You compute permutation importance for a random forest with two highly correlated features (): temperature in Celsius and temperature in Fahrenheit. Both receive low importance scores. Explain why, and suggest an approach to address this.
Problem
Prove that for a linear model with independent features, the SHAP value of feature for input is . Verify the efficiency axiom.
References
Canonical:
- Shapley, "A Value for N-Person Games" (1953)
- Ribeiro et al., "Why Should I Trust You? Explaining the Predictions of Any Classifier" (LIME, 2016), Section 3
Current:
-
Lundberg & Lee, "A Unified Approach to Interpreting Model Predictions" (SHAP, 2017), Sections 2-3
-
Molnar, Interpretable Machine Learning (2022), Chapters 5-9
-
Jain & Wallace, "Attention is not Explanation" (2019)
-
Hastie, Tibshirani, Friedman, The Elements of Statistical Learning (2009), Chapters 7-8
Next Topics
- Mechanistic interpretability: understanding model internals (circuits, features) rather than post-hoc attribution
- Cross-validation theory: validating that your model and its explanations generalize
Last reviewed: April 2026
Prerequisites
Foundations this topic depends on.
- Random ForestsLayer 2
- Decision Trees and EnsemblesLayer 2
- Empirical Risk MinimizationLayer 2
- Concentration InequalitiesLayer 1
- Common Probability DistributionsLayer 0A
- Sets, Functions, and RelationsLayer 0A
- Basic Logic and Proof TechniquesLayer 0A
- Expectation, Variance, Covariance, and MomentsLayer 0A
- Bias-Variance TradeoffLayer 2
- Bootstrap MethodsLayer 2
- Gradient BoostingLayer 2
- Gradient Descent VariantsLayer 1
- Convex Optimization BasicsLayer 1
- Differentiation in RnLayer 0A
- Matrix Operations and PropertiesLayer 0A