Beta. Content is under active construction and has not been peer-reviewed. Report errors on GitHub.Disclaimer

Methodology

Feature Importance and Interpretability

Methods for attributing model predictions to input features: permutation importance, SHAP values, LIME, partial dependence, and why none of these imply causality.

CoreTier 2Current~50 min
0

Why This Matters

You built a model that predicts well. Now someone asks: "which features matter?" This question arises in every applied ML project, from regulatory compliance (finance, healthcare) to debugging models to scientific discovery. The methods here answer variants of this question, but each defines "importance" differently. Confusing these definitions leads to incorrect conclusions about what drives model behavior and, worse, incorrect causal claims.

Permutation Importance

Definition

Permutation Importance

The permutation importance of feature jj is the decrease in model performance when the values of feature jj are randomly shuffled across the dataset, breaking the association between feature jj and the target:

PIj=Score(f,X,y)Score(f,X(πj),y)\text{PI}_j = \text{Score}(f, X, y) - \text{Score}(f, X^{(\pi_j)}, y)

where X(πj)X^{(\pi_j)} is XX with column jj permuted and Score is a performance metric (accuracy, R2R^2, etc.).

Permutation importance has a clean interpretation: it measures how much the model relies on the marginal distribution of feature jj for its predictions. It works for any model and any metric. Compute it on the test set, not the training set, to avoid reflecting memorization.

SHAP Values

Definition

Shapley Value

For a prediction f(x)f(x) with features {1,,d}\{1, \ldots, d\}, the Shapley value of feature jj is:

ϕj(x)=S{1,,d}{j}S!(dS1)!d![v(S{j})v(S)]\phi_j(x) = \sum_{S \subseteq \{1,\ldots,d\} \setminus \{j\}} \frac{|S|!(d - |S| - 1)!}{d!} \left[v(S \cup \{j\}) - v(S)\right]

where v(S)v(S) is the expected model output when only features in SS are observed and remaining features are marginalized out. The Shapley value attributes the prediction to each feature by averaging the marginal contribution of that feature across all possible orderings.

Theorem

Shapley Value Uniqueness

Statement

The Shapley value is the unique attribution method satisfying:

  1. Efficiency: j=1dϕj(x)=f(x)E[f(X)]\sum_{j=1}^d \phi_j(x) = f(x) - \mathbb{E}[f(X)]
  2. Symmetry: if features jj and kk contribute equally to all coalitions, ϕj=ϕk\phi_j = \phi_k
  3. Linearity: for combined games v=v1+v2v = v_1 + v_2, ϕj(v)=ϕj(v1)+ϕj(v2)\phi_j(v) = \phi_j(v_1) + \phi_j(v_2)
  4. Null player: if feature jj never changes any coalition value, ϕj=0\phi_j = 0

No other attribution method satisfies all four axioms simultaneously.

Intuition

Shapley values decompose the total prediction (minus baseline) into additive contributions, one per feature, in the only way that is fair, consistent, and complete. "Fair" means symmetric features get equal credit; "complete" means all credit is allocated.

Proof Sketch

Existence: the formula defines values satisfying all axioms (verify each). Uniqueness: assume two solutions ϕ\phi and ϕ\phi' both satisfy the axioms. By linearity, it suffices to prove uniqueness for unanimity games vS(T)=1v_S(T) = 1 if STS \subseteq T, else 0. For unanimity games, efficiency and null player force ϕj=1/S\phi_j = 1/|S| for jSj \in S and ϕj=0\phi_j = 0 otherwise. Since any game is a linear combination of unanimity games, linearity extends uniqueness to all games.

Why It Matters

SHAP (SHapley Additive exPlanations) uses Shapley values for ML model explanations. The uniqueness theorem means that if you accept the four axioms as reasonable fairness requirements, there is exactly one way to attribute predictions. This gives SHAP a theoretical grounding that permutation importance and LIME lack.

Failure Mode

Computing exact Shapley values requires 2d2^d coalition evaluations. For models with hundreds of features, this is intractable. Practical SHAP implementations use approximations: KernelSHAP (sampling-based) or TreeSHAP (exact for tree models, O(TLD2)O(TLD^2) per prediction). These approximations can introduce errors, and KernelSHAP in particular may not converge with insufficient samples.

LIME

Proposition

LIME Local Fidelity

Statement

LIME finds an interpretable model gg that approximates ff locally:

g=argmingGzZπx(z)(f(z)g(z))2+Ω(g)g^* = \arg\min_{g \in G} \sum_{z' \in \mathcal{Z}} \pi_x(z') \left(f(z') - g(z')\right)^2 + \Omega(g)

where πx(z)\pi_x(z') is a kernel weighting proximity to xx, Z\mathcal{Z} is a set of perturbed samples around xx, and Ω(g)\Omega(g) is a complexity penalty. For linear gg, the coefficients serve as local feature importances.

Intuition

LIME asks: "near this specific input, which features does the model rely on?" It answers by fitting a simple linear model to the model's behavior in a local neighborhood. The linear coefficients tell you the local importance of each feature for this particular prediction.

Proof Sketch

The objective is weighted least squares with regularization. For linear gg, this has a closed-form solution: the ridge regression coefficients using the kernel-weighted design matrix of perturbed samples.

Why It Matters

LIME is model-agnostic and produces per-instance explanations. Unlike global methods (permutation importance), it captures local behavior: a feature might be important for one prediction and irrelevant for another.

Failure Mode

LIME depends heavily on the perturbation distribution and kernel width. For tabular data, perturbing features independently ignores feature correlations, producing unrealistic samples. For images, the superpixel segmentation determines what "features" LIME can identify. Different kernel widths produce different explanations for the same input. There is no principled way to choose these hyperparameters.

Partial Dependence Plots

A partial dependence plot (PDP) shows the marginal effect of a feature xjx_j on the model output, averaging over all other features:

f^j(xj)=1ni=1nf(xj,xj(i))\hat{f}_j(x_j) = \frac{1}{n}\sum_{i=1}^{n} f(x_j, x_{-j}^{(i)})

PDPs are simple and global but assume feature independence. If xjx_j is correlated with other features, the PDP evaluates the model at unrealistic feature combinations (e.g., height = 180cm and age = 3).

Individual Conditional Expectation (ICE) plots show the same curve per instance rather than averaged, revealing heterogeneity in feature effects.

Built-in Importance

Some models provide importance measures directly:

  • Tree impurity importance: total decrease in Gini impurity or entropy from splits on feature jj. Biased toward high-cardinality features.
  • Coefficient magnitude: in linear models, βj|\beta_j| (after feature standardization). Only meaningful when features have comparable scales.
  • Attention weights: in transformers, often misinterpreted as importance. Attention is a computational mechanism, not an explanation of which inputs caused the output. Jain and Wallace (2019) showed attention frequently does not correlate with gradient-based importance.

The Causal Trap

Watch Out

Feature importance is not causal importance

All methods on this page answer: "what features does the model use?" None answer: "what features cause the outcome?" Permutation importance tells you the model relies on feature jj. It does not tell you that intervening on feature jj would change the outcome in the real world. A model predicting hospital mortality might rely on "has palliative care order" because that feature correlates with severity, not because palliative care causes death. Causal claims require causal methodology (randomized experiments, instrumental variables, do-calculus), not feature importance methods.

Watch Out

Global importance can hide local behavior

A feature with low global permutation importance might be the most important feature for a specific subgroup. Average importance masks heterogeneity. SHAP and LIME provide per-instance attributions that can reveal this, but aggregating them back to global summaries loses the same information.

Watch Out

Correlated features split importance unpredictably

If features jj and kk are highly correlated, permutation importance will underestimate both (shuffling one does not remove the information because the other is still intact). SHAP distributes importance between them according to the Shapley axioms, but the split may not match intuition. Neither method tells you which correlated feature is "truly important" because that is a causal question.

Key Takeaways

  • Permutation importance: global, model-agnostic, measures performance degradation when a feature's association with the target is broken
  • SHAP: per-instance, grounded in Shapley axioms (the unique fair attribution), but exponentially expensive to compute exactly
  • LIME: per-instance, local linear approximation, but sensitive to perturbation distribution and kernel width
  • PDPs: global marginal effect, assumes feature independence
  • None of these methods imply causality. Feature importance != causal importance.

Exercises

ExerciseCore

Problem

You compute permutation importance for a random forest with two highly correlated features (ρ=0.95\rho = 0.95): temperature in Celsius and temperature in Fahrenheit. Both receive low importance scores. Explain why, and suggest an approach to address this.

ExerciseAdvanced

Problem

Prove that for a linear model f(x)=β0+jβjxjf(x) = \beta_0 + \sum_j \beta_j x_j with independent features, the SHAP value of feature jj for input xx is ϕj(x)=βj(xjE[Xj])\phi_j(x) = \beta_j(x_j - \mathbb{E}[X_j]). Verify the efficiency axiom.

References

Canonical:

  • Shapley, "A Value for N-Person Games" (1953)
  • Ribeiro et al., "Why Should I Trust You? Explaining the Predictions of Any Classifier" (LIME, 2016), Section 3

Current:

  • Lundberg & Lee, "A Unified Approach to Interpreting Model Predictions" (SHAP, 2017), Sections 2-3

  • Molnar, Interpretable Machine Learning (2022), Chapters 5-9

  • Jain & Wallace, "Attention is not Explanation" (2019)

  • Hastie, Tibshirani, Friedman, The Elements of Statistical Learning (2009), Chapters 7-8

Next Topics

Last reviewed: April 2026

Prerequisites

Foundations this topic depends on.

Next Topics