Whoops…Nothing found

Try other keywords in your search

Save and Load a Dataset Asset

 0 Minutes

 0 Likes

 186 Views

When is this applicable?

Saving and loading Dataset Assets is crucial for storing data as Assets to use in other Flows or downstream in the same Flow. The Dataset Asset will persist even after your Flow completes and/or is deleted - an important distinction between saving Assets and saving Output, which is only available downstream in the same Flow.

 

How-To

Saving

Assume we have a Pandas DataFrame df, lines 7 and 8 will create and save a Dataset Asset from df.

from predict_backend.validation.dataset import Dataset

class CustomStep(Step):
  def run(self, flow_metadata):
    store_interface = StoreInterface(**flow_metadata)
    df = pd.DataFrame(columns=['Column 1'], data=[1,2,3])
    dataset = Dataset(dataset=df, label="My Dataset", name="my small dataset")
    store_interface.save_asset(dataset)

Loading

In a later step you may want to load in the Dataset Asset saved above. This can be a step in a separate or the same Flow. You would retrieve the Dataset Asset as show below in line 6.

from predict_backend.utils.asset import AssetType

class CustomStep2(Step):
  def run(self, flow_metadata):
    store_interface = StoreInterface(**flow_metadata)
    dataset = store_interface.get_asset(label="My Dataset", \
        type=AssetType.DATASET, name="my small dataset").object

 

What to Expect (Validation)

The loaded Asset object should be returned as a Pandas DataFrame. You should be able to load in this Dataset Asset from any of your Flows.

 

Additional Details

You may only save Pandas DataFrames as Dataset Assets.

Was this article helpful?