When is this applicable?
The XAI module can be used to generate instance explanations for machine learning models. This module currently only works with tabular datasets, requiring a trained machine learning model, and the training dataset. Then, the Flow writer can explain specific instances from a test dataset to understand the feature contributions in their model’s prediction. This creates useful visualizations which not only allow users to know the most important features of the model but also to help them evaluate that the model is paying attention to the right factors. Additionally, feature importances that don’t make sense to a user who is knowledgeable of the problem may shed light on issues with the model’s prediction method.
- Create the dataset you want to explain using the Virtualitics AI Platform’s Dataset Asset. In order for the explainer to work, the Dataset Asset needs to specify which features are categorical. Provide any additional necessary information for the Dataset Asset to be able to convert the categorical columns to other types of encodings. See the documentation of the Dataset Asset for more details.
- Import the Explainer Asset from predict_backend.ml.xai and initialize it.
- Call the Explainer’s explain method on a select few instances or use smart instance selection to automatically pick certain instances for you.
# Create explainer data asset on a one hot encoded dataset created by pd.get_dummies # modeling_data is the whole pd.DataFrame # ohe_features is a list of the one hot encoded columns # categorical_cols is a list of column names which correspond to categorical features # i.e. if the only categorical feature was "Animal" which was tranformed by pd.get_dummies # into the columns "Animal_cat", "Animal_dog", then categorical_cols = ["Animal"] # predict_cols is the list of input features for the model # encoding is a string represneting the encoding of the data. Options are "one_hot", "verbose", or "ordinal" explain_data = Dataset(modeling_data[ohe_features], label="example", name="explainer dataset", categorical_cols=categorical_cols, predict_cols=ohe_features, encoding="one-hot") # Train model xgb = XGBRegressor(learning_rate=0.1, n_jobs=-1, n_estimators=100, max_depth=5, eval_metric='mae') xgb.fit(modeling_data[ohe_features], modeling_data[target]) # Create explainer for ensemble model explainer = Explainer(model=xgb, training_data=explain_data, output_names=target, mode='regression', label="example", name="xgb explainer", use_shap=True) # Specify instance sets to explain and create Cards low = actual_test_samples.loc[actual_test_samples['Customer ID'] == 'L4646119'] #finance high = actual_test_samples.loc[actual_test_samples['Customer ID'] == 'L9257872'] #finance explain_instances = pd.concat([low, high]) titles = ["Low Default Probability", "High Default Probability"] plots = explainer.explain(explain_instances[ohe_features], method='manual', titles=titles, expected_title="Average Probability of Default in next 30 days.", predicted_title="Predicted Probability of Default in next 30 days.", return_as="plots") xai_plots_dash = Dashboard(plots)
What to Expect (Validation)
- The Explainer Asset can also use Model Assets rather than the models themselves.
- Within the Explainer’s explain function, in addition to the manual mode where specific instances are passed in, a smart mode allows the function to automatically pick interesting instances.