Exploring Stock Price Data with VIP

This is a demo notebook showing how one can explore stock data with the help of VIP visualization. The main areas of focus in this notebook are stock price visualization and model fitting for stock price prediction.

In [25]:
import time
from virtualitics import api
import pandas as pd
import numpy as np
import pandas_datareader.data as web
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')
In [26]:
vip = api.VIP()
Setting up WebSocket connection to: ws://localhost:12345/api
Connection Successful! Initializing session.

Set the tickers and timeline of interest

In [27]:
tickers = ['XOM', 'WTI', 'AAPL', 'GOOG', 'F', 'INTC', 'MSFT', 'TSLA', 
           'FB', 'WMT', 'AMZN', 'BP', 'HP', 'NYT', 'TYO', 'SPY']

# Get information for dates between start and end
start = datetime(2015, 1, 1)
end = datetime(2018, 12, 1)
In [28]:
df = pd.DataFrame()
for ticker in tickers:   
    f = web.DataReader(ticker, 'iex', start, end)
    f['Series Name'] = ticker
    df = pd.concat([df, f])
In [29]:
df.head()
Out[29]:
open high low close volume Series Name
date
2015-01-02 78.0962 78.7735 77.7237 78.5872 10220410 XOM
2015-01-05 77.9692 78.2232 75.7682 76.4369 18502381 XOM
2015-01-06 76.3946 77.3851 75.3618 76.0306 16670713 XOM
2015-01-07 76.7417 77.4444 76.1914 76.8010 13590721 XOM
2015-01-08 77.2497 78.1132 77.0380 78.0793 15487496 XOM

The following cells engineer some features from the basic features returned from the pandas reader. Some of the features we engineer are the average price, returns, price changes, and moving averages. Different time intervals are used for better comparison.

In [30]:
# Calculate average price based on open, close, high, low
cols = df.loc[: , "open":"close"]
df['avg_price'] = cols.mean(axis=1)
df['date'] = df.index
# Compute amount sold per day
df['amount_sold'] = df['avg_price'] * df['volume']
In [31]:
# We can also look at the change per day
df['price_change_per_day'] = df['close'] - df['open']

# Make sure we don't use open and close values from different companies at the boundaries
mask = df['Series Name'] != df['Series Name'].shift(1)
df['price_change_per_day'].loc[mask == True] = np.nan
In [32]:
# Compute price change per 30 days. 
mask30 = df['Series Name'] != df['Series Name'].shift(periods=30)
df['price_change_per_30_days'] = df['close'] - df['open'].shift(periods=30)
df['price_change_per_30_days'].loc[mask30 == True] = np.nan

# Compute percentage change per 30 days
df['perc_price_change_per_30_days'] = (df['close'] - df['open'].shift(periods=30)) / df['avg_price'].shift(1)
df['perc_price_change_per_30_days'].loc[mask30 == True] = np.nan

# Let's also look at a simple metric for the returns using the ratio of avg_price/initial price.
# This gives a better idea of the profitability of the stock.
transf = lambda x : x / x[0]
df['returns'] = df.groupby('Series Name')['avg_price'].transform(transf)

Load the data into VIP

In [36]:
vip.load_data(df, 'finance_data')

Let's first visualize the average stock prices for the various companies. Each color corresponds to a different company. Go into VIP to get more information on the plot.

In [12]:
vip.plot(plot_type='line', x='date', y='avg_price', color='Series Name', y_normalization='log10', x_scale=1.5)