The most impactful data is the data that makes sense to a user, especially if that data is being generated or augmented by machine learning. Incorporating Explainable AI (XAI) into your App enables users to better understand the context around the information being processed and communicated. Specifically within the Virtualitics AI Platform (VAIP), the XAI module can be used to generate instance explanations for machine learning models.
This module currently only works with tabular datasets, requiring a trained machine learning model, and the training dataset. Then, you can explain specific instances from a test dataset to understand the feature contributions in the model’s prediction.
This creates useful visualizations which not only allow users to know the most important features of the model but also to help them evaluate that the model is paying attention to the right factors.
Additionally, feature importances that don’t make sense to a user who is knowledgeable of the problem may shed light on issues with the model’s prediction method.
Using the XAI Model
To use the XAI Model:
- Create the dataset you want to explain using the Virtualitics AI Platform’s Dataset asset. In order for the explainer to work, the Dataset asset needs to specify which features are categorical. Provide any additional necessary information for the Dataset asset to be able to convert the categorical columns to other types of encodings.
- Import the Explainer asset from predict_backend.ml.xai and initialize it.
- Call the Explainer’s
explain
method on a select few instances or use smart instance selection to automatically pick certain instances for you.
from virtualitics_sdk import Dataset, Explainer, ...
...
# Create explainer data asset on a one hot encoded dataset created by pd.get_dummies
explain_data = Dataset(
dataset=modeling_data[ohe_features], # modeling_data is the whole pd.DataFrame, ohe_features is a list of the one-hot encoded columns
label="example",
name="explainer dataset",
categorical_cols=categorical_cols, # categorical_cols is a list of original categorical feature names
predict_cols=ohe_features, # predict_cols is the list of input features for the model
encoding="one-hot" # encoding is the format of the data: "one-hot"
)
# Train model
xgb = XGBRegressor(learning_rate=0.1, n_jobs=-1, n_estimators=100, max_depth=5, eval_metric='mae')
xgb.fit(modeling_data[ohe_features], modeling_data[target])
# Create explainer for ensemble model
explainer = Explainer(model=xgb, training_data=explain_data, output_names=target, mode='regression',
label="example", name="xgb explainer",
use_shap=True)
# Specify instance sets to explain and create Cards
low = actual_test_samples.loc[actual_test_samples['Customer ID'] == 'L4646119'] high = actual_test_samples.loc[actual_test_samples['Customer ID'] == 'L9257872']
explain_instances = pd.concat([low, high])
titles = ["Low Default Probability", "High Default Probability"]
plots = explainer.explain(
data=explain_instances[ohe_features], # explain_instances is the dataset to explain, ohe_features is a list of one-hot encoded features
method='manual', # 'manual' method means the entire dataset will be explained (not using smart instance selection)
titles=titles, # List of titles for the explanation plots
expected_title="Average Probability of Default in next 30 days.", # Title for the expected value in the plot
predicted_title="Predicted Probability of Default in next 30 days.", # Title for the predicted value in the plot
return_as="plots" # The output format is "plots"
)
plots_card = Card(
title="Explainer Plots Card",
content=plots,
)
page.add_card_to_section(plots_card, "Explainer Plots Section")
What to Expect
See the example image below:
Additional Details
- The Explainer Asset can also use Model Assets rather than the models themselves.
- Within the Explainer’s
explain
function, in addition to the manual mode where specific instances are passed in, a smart mode allows the function to automatically pick interesting instances.