AI Model Interpretability: SHAP, LIME, and Integrated Gradients for XAI

Explaining black-box ML models for compliance, debugging, and stakeholder communication

Explainability is required for regulatory compliance and essential for debugging ML models. SHAP (SHapley Additive exPlanations): game theory-based feature attribution. Each feature receives a SHAP value representing its marginal contribution to the prediction. Global: feature importance as mean absolute SHAP values. Local: waterfall plot explaining individual predictions. import shap; explainer = shap.TreeExplainer(model); shap_values = explainer.shap_values(X_test); shap.summary_plot(shap_values, X_test, feature_names=features). LIME (Local Interpretable Model-agnostic Explanations): creates local linear approximation around each prediction using perturbed samples. Good for any model type including neural networks. Integrated Gradients: gradient-based attribution for neural networks. Accumulates gradients along path from baseline to input, more theoretically grounded than raw gradients. Attention visualization: for transformer models, visualize attention weights to see which tokens influence predictions. Use BertViz for interactive visualization. When to use each: SHAP for tabular data (fast TreeSHAP for tree models, slower KernelSHAP for any model). LIME for model-agnostic explanations with any data type. Integrated Gradients for neural networks. Regulatory use: SHAP values for individual credit decision explanations (EU AI Act, GDPR right to explanation). Feature importance for compliance audits. Limitation: all explanation methods approximate the true model behavior and can be inconsistent for highly complex models.

Also available in 中文.