This document is a guided walkthrough of the “Predictive Modeling” tutorial Flow. In this Flow we demonstrate how machine learning can be effectively implemented and then enhanced within the Virtualitics AI Platform. In this tutorial we’re using house pricing data to train a model that identifies “flippable” properties. We then use various Page Elements to present the output of the model in ways that provide explainability and insights for informed decision-making.
It is recommended that you open this Flow in the Virtualitics AI Platform and progress through it as you follow this walkthrough. Only a subset of the code used to build the Flow can be found in this walkthrough, the full script can be found here: predictive_modeling.py
The data for this walkthrough can be found here: house_prices.csv and was sourced from Kaggle.
If this is your first introduction to the Virtualitics AI Platform Flows, also consider the “Hello World!” tutorial Flow. It is our most beginner-oriented tutorial Flow and primarily focuses on the core aspects of Flow development.
Constructing the Flow
When creating flow scrips, we recommend organizing your file in the following order:
- Imports
- Step Definitions
- Flow and Pages Creation
This sequential organization can be seen in the full script for this Flow provided in the section above. However, for the purpose of this walkthrough, we will go through each Step and its corresponding Page together as it will make for easier reading.
Imports
We begin every flow by importing our key classes, which include Card, Section, Page, and Step. We also import the various Elements we’ll be using.
# Import predict flow classes
from predict_backend.flow.flow import Flow
from predict_backend.flow.step import Step, StepType
from predict_backend.page import Page, PageType, Section, Card
from predict_backend.store.store_interface import StoreInterface
from predict_backend.utils.asset import AssetType
from predict_backend.validation.dataset import Dataset
You will also need to import:
- Element and plot types from the Virtualitics SDK
- Packages you’ll use for data processing, such as NumPy and Pandas
Create Flow Object
Here, we create our actual Flow object and can also assign an image to be associated with it on the Virtualitics AI Platform homepage. See this article for more guidance on setting Flow tile images.
# Instantiate flow and assign image
flow_description = "New to writing flows in the Virtualitics AI Platform? Check out this Predictive Modeling tutorial flow."
flow_image_link = "https://predict-tutorials.s3-us-gov-west-1.amazonaws.com/predictive_modeling_tile.jpeg"
predictive_modeling = Flow("Predictive Modeling Tutorial", flow_description, flow_image_link)
Our Flow can be seen here on the Virtualitics AI Platform home page.
Build Steps and Pages
Steps are the primary building blocks of a Flow. They orchestrate all backend and frontend tasks performed on each Page. We will now go through each of this Flow’s Steps and their corresponding pages:
- Data Upload
- Investment Dashboard
- Scenario Planning Dashboard
1. Data Upload
In this Step, we use the DataSource element to provide the user with a means of uploading data. We add it to a Card and then place it in the empty Section we fetch using StoreInterface. Additionally, we specify the data must be uploaded as a CSV file and provide additional information to help guide the user.
# This step has the user upload the dataset we'll be using
class DataUploadStep(Step):
def run(self, flow_metadata):
store_interface = StoreInterface(**flow_metadata)
page = store_interface.get_page()
section = page.get_section_by_title("")
data_link = "https://docs.virtualitics.com/en_US/create-a-new-flow/predictive-modeling"
data_source = DataSource(
title="Upload House Prices data here!",
options=["csv"],
description=f"You can find the relevant data for this tutorial here: {data_link}",
required=True
)
data_card = Card(title="Data Upload Card", content=[data_source])
page.add_card_to_section(data_card, "")
store_interface.update_page(page)
Here is what ends up getting rendered for the user:
2. Investment Dashboard
In this Step we build our model for identifying flippable houses and then use it’s output, along with contextual information from the dataset, to build a dashboard of insight-rich elements.
We build, train, and apply our model the same way we would using any of the popular modeling libraries in Python. However, we take the extra step of rolling up our model of choice into a Virtualitics AI Platform Model. This allows us to use some powerful tools that are specific to the Virtualitics AI Platform, as we’ll see in the next Step. Notice that we save our trained model as an Asset, allowing it to be accessed in both the Assets tab and future Steps.
from xgboost import XGBRegressor
# Build model
store_interface.update_progress(40, "Creating Model")
xgb = Model(
model=XGBRegressor(learning_rate=0.1, n_jobs=-1, n_estimators=100, max_depth=5, eval_metric='mae', random_state=42),
label="xgb",
name="house sale price predictor"
)
# Train model and save as asset
xgb.fit(train_data[ohe_features], train_data[target_col])
store_interface.save_asset(xgb)
# Apply trained model to test set
test_target = test_data[target_col]
test_pred = xgb.predict(test_data[ohe_features])
test_results = pd.DataFrame({'ActualPrice':test_target, 'PredictedPrice':test_pred})
Throughout the rest of this step we create an Infographic and series of plots that are then organized into a Dashboard. We also place a recommendation Infographic at the bottom of the Page that gives the user the opportunity to kick-off a custom event. Dedicated articles on creating these elements can be found here:
Create and Display an Infographic
Create and Display a Dashboard
This step ends up creating the following content:
3. Scenario Planning Dashboard
One of the greatest challenges involved with the use of machine learning is the ability to explain a model’s decisions to end users. To address this we’ve built our XAI module, which allows users to interact with their data and find explanations for a model’s behavior.
Before we actually build any XAI elements, we first need to create an Explainer from our trained model.
# Create explainer for model
explainer = Explainer(model=xgb, training_data=explain_dataset, output_names=["SalePrice"], mode='regression',
label="explainer", name="ensemble model", use_shap=True, use_lime=False)
store_interface.save_asset(explainer)
After some additional preprocessing steps (which can be found in the full flow script) we can use our Model, Explainer, and some additional inputs to build an XAI Dashboard.
# Create the XAI Dashboard
dash = XAIDashboard(xgb, explainer, prediction_dataset, "LandContour", "PredictedPrice", "PredictedPrice", title="",
bounds=bounds, description=XAIDashboard.xai_dashboard_description(), train_data=explain_dataset, expected_title="Avg. Listed Price",
predicted_title="Predicted Price", encoding=DataEncoding.ONE_HOT)
current_page.add_content_to_section(dash, "")
A user can select any sample on the Dashboard, change its feature values, and then hit the “Predict” button. This will then produce an explanation Infographic that will display the new model prediction along with the likelihood of such a sample existing.
We can also build an XAI Plot, which takes a sample and visualizes how each of its feature values are influencing the model’s prediction.
titles = ["This House's Predicted Price is 30% Higher than Its Listed Price"]
plots = explainer.explain(explain_instances[ohe_features], method='manual', titles=titles, expected_title="Avg. Listed
Price",
predicted_title="Predicted Price", return_as="plots")
xai_plots_dash = Dashboard(plots, title="Explanation of a Flippable House")
current_page.add_content_to_section(xai_plots_dash, "")
Users interact with the plot to gain more more insight on each feature.
A dedicated article on the XAI module can be found here.
Completing a Flow
The final step in constructing a Flow involves chaining together each Step in the Flow object we created at the start. To do this, we pass an ordered list of our Steps to the Flow’s chain()
method:
# Chain together the steps of the flow
predictive_modeling.chain([
data_upload_step,
executive_dashboard_step,
scenario_planning_step
])
After completing this final step, your Flow file can be uploaded to the Virtualitics AI Platform using the “Create Flow” button in the top right of the home page of your workspace:
You’ll be alerted if any errors are detected in your Flow. If no errors are detected, your Flow will appear on the home page of your workspace!