When is this applicable?
Saving and loading Dataset Assets is crucial for storing data as Assets to use in other Flows or downstream in the same Flow. The Dataset Asset will persist even after your Flow completes and/or is deleted - an important distinction between saving Assets and saving Output, which is only available downstream in the same Flow.
How-To
Saving
Assume we have a Pandas DataFrame df
, lines 7 and 8 will create and save a Dataset Asset from df
.
from predict_backend.validation.dataset import Dataset
class CustomStep(Step):
def run(self, flow_metadata):
store_interface = StoreInterface(**flow_metadata)
df = pd.DataFrame(columns=['Column 1'], data=[1,2,3])
dataset = Dataset(dataset=df, label="My Dataset", name="my small dataset")
store_interface.save_asset(dataset)
Loading
In a later step you may want to load in the Dataset Asset saved above. This can be a step in a separate or the same Flow. You would retrieve the Dataset Asset as show below in line 6.
from predict_backend.utils.asset import AssetType
class CustomStep2(Step):
def run(self, flow_metadata):
store_interface = StoreInterface(**flow_metadata)
dataset = store_interface.get_asset(label="My Dataset", \
type=AssetType.DATASET, name="my small dataset").object
What to Expect (Validation)
The loaded Asset object should be returned as a Pandas DataFrame. You should be able to load in this Dataset Asset from any of your Flows.

Additional Details
You may only save Pandas DataFrames as Dataset Assets.