Whoops…Nothing found

Try other keywords in your search

Get a Dataframe from a DataSource Element

 1 Minute

 0 Likes

 189 Views

When is this applicable?

Within your Flow, you may want to perform data science, data analysis or data manipulation on a user-uploaded file. You can use a DataSource element to provide users with an interface of uploading files. Additionally, you can reference the data uploaded by users in subsequent steps from the store_interface, allowing you to use Python packages like pandas to manipulate your data as needed.

 

How-To:

  1. Create a DataUpload Class:
class DataUpload(Step):
     def run(self, flow_metadata):
        pass
  1. Create a Card, Section, Page and Step for the Data Upload Step. To understand what a card, section, page and step are, see Get Started With Flows (Developer) for reference. 
data_upload_card = Card("Upload", [DataSource("Dataset", ["csv"], description="Upload the csv")])
data_upload_content = Section("Upload your data", [data_upload_card])
data_upload_page = Page("Data Upload", PageType.INPUT, {}, [data_upload_content])
data_upload_step = DataUpload("Data Upload", "Upload Dataset.", "Data Ingestion", StepType.INPUT, data_upload_page)
  1. Add the data upload Step to the list of Steps:
predict_steps = [
    data_upload_step,
    model_step
]
  1. Within the class where you want to access the file, add the following code:
store_interface = StoreInterface(**flow_metadata)

# Get current page
current_page = store_interface.get_page()

# Fetch input data
data = store_interface.get_element_value(data_upload_step.name, "Dataset")    

Now, you can access the uploaded CSV as a DataFrame using the variable data.

 

All the steps together:

from predict_backend.flow.step import Step, StepType
from predict_backend.flow.flow import Flow
from predict_backend.page import Card, Page, PageType, Section 
from predict_backend.page.elements import DataSource
from predict_backend.store.store_interface import StoreInterface
import logging

class DataUpload(Step):
    def run(self, flow_metadata):
        pass
        
class Model(Step):
     def run(self, flow_metadata):
        store_interface = StoreInterface(**flow_metadata)
        
        # Get current page
        current_page = store_interface.get_page()
        
        # Fetch input data
        data = store_interface.get_element_value(data_upload_step.name, "Dataset")
        logger.info("Dataset contains {} timesteps!".format(len(data)))
        
        # Use the df here however you need!
        
system_failure_flow = Flow("System Failure Prediction", "Predict the time to failure.")

data_upload_card = Card("Upload", [DataSource("Dataset", ["csv"], description="Upload the csv")])
data_upload_content = Section("Upload your data", [data_upload_card])
data_upload_page = Page("Data Upload", PageType.INPUT, {}, [data_upload_content])
data_upload_step = DataUpload("Data Upload", "Upload Dataset.", "Data Ingestion", StepType.INPUT, data_upload_page)

model_content = Section("Model", [])
model_page = Page("Model Training", PageType.RESULTS, {}, [model_content])
model_step = Model("Model Training and Results", "Train Model", "Model Analysis", StepType.RESULTS, model_page)

# Put all steps together to build the flow
predict_steps = [
    data_upload_step,
    model_step
]

system_failure_flow.chain(predict_steps)

 

What to Expect (Validation):

Validation of Data Upload:

  1. Run your Flow and access it at localhost:3000.
  2. You should see a data upload Page, similar to the one below. Upload your dataset:
  1. If successful, you should see “Upload Complete” pop up in the bottom left corner. Click Next to move to the next Page. 

 

Validation of Accessing DataFrame:

  1. In the terminal output, you should see “Dataset contains {number} timestamps” printed. The number should be the number of rows in your dataset.
  2. Try running simple Pandas manipulations within the class you are accessing your DataFrame.

 

Additional Resources

Was this article helpful?