Saving and loading Dataset Assets is crucial for storing data as Assets to use in other Flows or downstream in the same Flow.
The Dataset Asset will persist even after your Flow completes and/or is deleted - an important distinction between saving Assets and saving Output, which is only available downstream in the same Flow.
Saving a Dataset Asset
In this example, assuming we have a Pandas DataFrame df
, lines 7 and 8 will create and save a Dataset Asset from df
.
from predict_backend.validation.dataset import Dataset
class CustomStep(Step):
def run(self, flow_metadata):
store_interface = StoreInterface(**flow_metadata)
df = pd.DataFrame(columns=['Column 1'], data=[1,2,3])
dataset = Dataset(dataset=df, label="My Dataset", name="my small dataset")
store_interface.save_asset(dataset)
Loading a Dataset Asset
In a later Step, you may want to load in the Dataset Asset saved above. This can be a Step in a separate or the same Flow. Retrieving the Dataset Asset is shown in line 6.
from predict_backend.utils.asset import AssetType
class CustomStep2(Step):
def run(self, flow_metadata):
store_interface = StoreInterface(**flow_metadata)
dataset = store_interface.get_asset(label="My Dataset", \
type=AssetType.DATASET, name="my small dataset").object
What to Expect
The loaded Asset object should be returned as a Pandas DataFrame. You should be able to load in this Dataset Asset from any of your Flows.
Previous Article |
Next Article |