Storing and loading Dataset Assets is essential for preserving data as reusable Assets, allowing you to leverage these reusable Assets in other Apps or in later Steps within the same flow.
The Dataset Asset will persist even after your current flow completes or is deleted. This is an important distinction between saving Assets, which are reusable across different Apps, and saving Outputs, which are only available downstream within the same flow in a singular App.
Saving a Dataset Asset
In this example, assuming we have a Pandas DataFrame df, lines 7 and 8 will create and save a Dataset Asset from df.
from virtualitics_sdk import Dataset
class StepOne(Step):
def run(self, flow_metadata):
store_interface = StoreInterface(**flow_metadata)
df = pd.DataFrame(columns=['Column 1'], data=[1,2,3])
# Store the pandas dataframe as a Dataset asset
dataset = Dataset(dataset=df, label="xyz", name="My Dataset")
# Save the Dataset asset to the platform persistence system (aka store)
store_interface.save_asset(dataset)
Loading a Dataset Asset
In a later Step, you may want to load in the Dataset Asset saved above. This can be a Step in either the same or a separate App. Retrieving the Dataset Asset is shown in line 6.
from virtualitics_sdk import AssetType
class StepTwo(Step):
def run(self, flow_metadata):
store_interface = StoreInterface(**flow_metadata)
dataset_asset = store_interface.get_asset(label="xyz", type=AssetType.DATASET, name="My Dataset")
# Get the object which is stored by the asset in this case, a pandas dataframe)
df = dataset_asset.object
What to Expect
The loaded Asset object should be returned as a Pandas DataFrame. You should be able to load in this Dataset Asset from any of your Apps.