Explainable AI in Practice

Explainable AI has become a compliance checkbox in many organizations: “we need explanations for our model decisions.” But most of the time, the people asking for explanations don’t actually use them, the explanations don’t represent how the model actually makes decisions, and the technical machinery is applied without a clear question being answered.

This is a practitioner’s take: when model explainability matters, what the tools actually do, and where the hard problems are.

When Explanations Actually Matter

Not every model needs to be explainable. A recommendation engine for playlist selection doesn’t require per-decision justification. A fraud detection model that affects whether a transaction is blocked or a customer account is suspended does.

The situations where explainability has real value:

Regulatory compliance: In finance, healthcare, and credit, there are regulatory requirements for model explainability. In the EU, GDPR gives individuals a right to an explanation for automated decisions that significantly affect them. In credit, ECOA requires adverse action notices — customers must be told why they were denied credit.

Model debugging: Understanding which features drive predictions helps you find where the model is wrong in interpretable terms. If a credit model is denying applications primarily based on zip code, that’s a fairness problem you can diagnose through feature importance.

Stakeholder trust: Domain experts (doctors, risk managers, experienced buyers) are more likely to act on model outputs they can interrogate. If a supply chain planner can see that the model is recommending a large order because of an upcoming holiday and high recent sales velocity, they can apply their own judgment to whether that reasoning is sound.

Anomaly detection: When a model produces an unusual prediction, explanation tools help determine whether the unusual prediction is because the input is genuinely unusual (correct) or because the model is extrapolating outside its training distribution (incorrect).

The Main Tools

SHAP (SHapley Additive exPlanations)

SHAP is the most principled approach to local feature attribution. It computes how much each feature contributed to the difference between the actual prediction and the baseline prediction (usually the mean prediction).

The Shapley values come from cooperative game theory: each feature “player” gets credited for its marginal contribution to the prediction, averaged over all possible orderings of features entering the model.

What SHAP gives you:

shap_values[i, j] = how much feature j pushed the prediction for example i away from the baseline
Positive values push toward the target class, negative values push away
sum(shap_values[i, :]) + baseline = prediction[i] — the values are additive and complete

Where it’s used:

import shap
explainer = shap.TreeExplainer(model)  # for tree models
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)  # global feature importance
shap.force_plot(explainer.expected_value, shap_values[0], X_test.iloc[0])  # single prediction

Limitations: For complex non-linear models (deep neural networks), exact SHAP values are computationally intractable. Approximate methods (KernelSHAP, DeepSHAP) exist but are slower and less precise. SHAP values reflect the model’s logic, not necessarily the actual causal structure of the world.

LIME (Local Interpretable Model-Agnostic Explanations)

LIME works differently: it fits a locally-accurate simple model (linear regression or decision tree) around a specific prediction, and uses that simple model as the explanation.

The procedure: sample perturbations of the input, run them through the black-box model to get predictions, weight samples by their proximity to the original input, and fit a linear model on (perturbations, predictions). The linear model’s coefficients are the explanation.

When LIME is useful: Model-agnostic — works for any model, including deep learning. For text and image inputs, LIME can highlight which tokens or superpixels drove the prediction.

Limitations: The explanation is inherently local and approximate. Different random seeds for perturbation sampling produce different explanations. For complex models in high-dimensional spaces, the locally linear approximation may be poor.

Attention Visualization (for Transformers)

Transformer-based models use attention mechanisms that produce attention weights — scores indicating how much each input token “attended to” every other token. These are often presented as explanations: “the model focused on these words.”

The problem: attention is not explanation. Attention weights are internal computational states, not faithfully representing why the model produced an output. High attention to a word doesn’t mean the word caused the prediction — the model may have high attention to a word precisely because it’s ruling it out.

For genuine explanation in transformer models, use SHAP with integrated gradients (more computationally expensive but more faithful).

Feature Importance (from Trees)

Gradient boosted trees and random forests naturally produce feature importance scores: either split-based (how often a feature was used to split, weighted by gain) or permutation-based (how much does performance drop when this feature is randomly shuffled?).

Split-based importance is fast but biased toward high-cardinality features and features that appear early in trees. Permutation importance is slower but unbiased and measures actual predictive contribution.

Practical guidance: Use SHAP-based global feature importance for tree models rather than built-in importance — it’s more consistent and accounts for interaction effects.

The Hard Problems

Faithfulness vs. Interpretability

An explanation is faithful if it accurately describes how the model actually makes its decision. An explanation is interpretable if a human can understand it.

These are often in tension. The true explanation of a gradient boosted tree with 1,000 decision trees is a 1,000-tree ensemble — fully faithful but not interpretable. A SHAP summary plot is interpretable but is a simplification. The simplification may be misleading.

Explanation ≠ Causation

SHAP tells you which features pushed the prediction up or down according to the model’s learned associations. It does not tell you which features causally influence the outcome. A model trained on biased historical data will produce SHAP explanations that faithfully reflect the model’s reasoning, even if that reasoning relies on spurious correlations.

If the goal is causal understanding — which interventions would change the outcome — you need causal inference methods, not SHAP.

The Gaming Problem

In adversarial settings (loan applications, fraud detection), publicizing which features the model uses for its decisions enables gaming. A loan applicant who knows the model heavily weights job stability will present job history in a favorable light. The explanation that builds trust also creates an attack surface.

What Good Explainability Practice Looks Like

Define the explanation consumer first. A compliance officer needs a different explanation than a risk manager or a model developer. Build for the specific consumer, not explanations in general.

Use explanations to debug, not to trust blindly. SHAP and LIME are most valuable for understanding where and why a model fails. An explanation that says “the model denied this loan primarily because of zip code” is a red flag to investigate, not a justification to accept.

Compare explanations to domain knowledge. Do the features the model relies on match what domain experts say matters? If not, investigate. The model may have learned a spurious correlation that will fail out-of-sample.

Track explanation stability. Good explanations should be consistent for similar inputs. If the same input produces very different explanations across model versions, something is wrong with the model’s learned representations.