Related Categories

Datetime Handling

Datetime Reformatting

Datetime values can often be difficult to work with, given the multitude of different formats that are used. VIP can recognize a range of datetime formats but we find that our clients occasionally have difficulty getting their dates in a format that works with VIP. Let’s take a look at an example involving Branch Sales Data:

We can create a Pandas DateTime Object by using the pd.to_datetime() function. This will convert our data stored as Strings to a datetime Object containing attributes like year, day of the week, and timestamp.

Storing our dates as objects will encapsulate all of the information we have about the date in a format that can be universally understood when loaded into VIP. Additionally, the Pandas to_datetime() function can infer the datetime format provided. For getting started with processing datetime data, try using the code below:

df.date = pd.to_datetime(df.date, infer_datetime_format=True)

Where “date” is the name of the column in your dataframe containing your date data. This code will convert the date column in your dataframe to a datetime object and uses infer_datetime_format to guess the datetime format you are using. You can then load your data into VIP as usual with:

from virtualitics import api
VIP = api.VIP()
VIP.load_data(df)

Or if your data is already loaded into VIP, simply:

from virtualitics import api
VIP = api.VIP()
VIP.add_column(df.date)

If the infer_datetime_format parameter is not capturing the format you are using, you can call the format parameter and supply the syntax of the date. Consider a date input with the following format:

The code we would use to convert these date strings to datetime objects is as such:

df.date = pd.to_datetime(df.date, format="%m/%d/%Y %I:%M:%S %p")

With the format syntax indicated with the below table:

 

More information on string to datetime format syntax can be found here.

TimeDeltas

Generating TimeDeltas can be very useful for feature engineering within VIP. Using datetime objects in Python can simplify this process incredibly quickly. Consider the following data relating to purchase orders:

We are interested in understanding the time difference between when the Purchase Order is approved and issued. We will first convert each column to a datetime object:

df['Purchase Order Approval Date'] = pd.to_datetime(df['Purchase Order Approval Date'], infer_datetime_format = True)
df['Purchase Order Issue Date'] = pd.to_datetime(df['Purchase Order Issue Date'], infer_datetime_format = True)

Now, we can find the TimeDelta between these two date fields by simply subtracting the two datetime objects:

df['Purchase Order Approval Date'] - df['Purchase Order Issue Date']

You will notice that performing a subtraction has converted a Series of datetime objects to a Series of TimeDelta objects. We are interested in just obtaining the number of days between each date. Let’s apply a brief function to extract the “days” attribute from the timedelta objects and add this to our data:

df[‘timedelta’] = (df['Purchase Order Approval Date'] - df['Purchase Order Issue Date']).apply(lambda x: x.days)

Series.apply enables us to apply a function (in this case, we have defined one using the lambda keyword, where x is the input to the function) to extract the days from the Series of timedelta objects. We added this new feature to the original dataframe (df) as “timedelta”: