Getting Started with SHAP Values

What are SHAP Values?

SHAP values (SHapley Additive exPlanations) are a way to explain the output of any machine learning model. They are calculated using a game theoretic approach that measures each player's contribution to the final outcome. In machine learning, each feature is assigned an importance value representing its contribution to the model's output.

SHAP values can be used to explain both individual predictions and the overall performance of a model. To explain an individual prediction, we can plot the Shapley values for each feature. This plot will show how each feature contributes to the model's prediction. To explain the overall performance of a model, we can plot the Shapley values for each feature across all of the data points in the training set. This plot will show which features are most important to the model's predictions.

SHAP values can also be used to identify potential biases in the model. To do this, we can plot the Shapley values for each feature across different groups of data points. If we see that the Shapley values for a feature are different for different groups of data points, this suggests that the model is biased against that group.

SHAP values are a powerful tool for explaining the output of machine learning models. They can be used to understand both individual predictions and the overall performance of a model. They can also be used to identify potential biases in the model.

Installing SHAP

%pip install shap -q

Loading the data

In this section, we will use the Mobile Price Classification dataset from Kaggle to build and analyze multi classification models.

import pandas as pd mobile = pd.read_csv("train.csv") mobile.head()

Preparing the data

from sklearn.model_selection import train_test_split X = mobile.drop('price_range', axis=1) y = mobile.pop('price_range') # Train and test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

Training and evaluating the model

from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report # Model fitting rf = RandomForestClassifier() rf.fit(X_train, y_train) # Prediction y_pred = rf.predict(X_test) # Model evaluation print(classification_report(y_pred, y_test))

Calculating SHAP Value

import shap shap.initjs() # Calculate SHAP values explainer = shap.TreeExplainer(rf) shap_values = explainer.shap_values(X_test)

Summary Plot

# Summarize the effects of features shap.summary_plot(shap_values, X_test)

shap.summary_plot(shap_values[0], X_test)

Dependence Plot

shap.dependence_plot("battery_power", shap_values[0], X_test,interaction_index="ram")

Force Plot

shap.plots.force(explainer.expected_value[0], shap_values[0][12,:], X_test.iloc[12, :], matplotlib = True)

shap.plots.force(explainer.expected_value[1], shap_values[1][12, :], X_test.iloc[12, :],matplotlib = True)

y_test.iloc[12]

Decision Plot

shap.decision_plot(explainer.expected_value[1], shap_values[1][12,:], X_test.columns)

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}What are SHAP Values?