Using VIP's Network Extractor on ETF Data

In this notebook, we demonstrate how you can use VIP's Network Extractor to create a network dataset from a tabular spreadsheet of Exchange Traded Fund (ETF) data. This dataset has a mixture of numerical and categorical features, which works well when extracting a network. We can then use the network to understand which ETFs are similar to each other, which could help find patterns for stronger or weaker investments.

In [1]:
from virtualitics import api
import pandas as pd
In [2]:
data = pd.read_csv("C:/Users/evan/Documents/Virtualitics/SampleData/Exchange Traded Funds.csv")
data.head(3)
Out[2]:
Ticker URL Volatility_1yr Expense Ratio AUM Trading Volume DivYield Region Issuer Focus Weighting Scheme Return Return_1yr Return_3yr
0 FCG http://finance.yahoo.com/q?s=FCG 57.95 1.778151 2.215347 6.233085 0.755112 North America Other Energy Equal Below median -65.54 -37.70
1 EWZS http://finance.yahoo.com/q?s=EWZS 33.39 1.792392 1.347135 4.694789 0.627366 Latin America BlackRock Small Cap Market Cap Below median -42.87 -36.89
2 BRF http://finance.yahoo.com/q?s=BRF 32.02 1.778151 1.828080 4.759547 0.575188 Latin America Other Small Cap Market Cap Below median -38.79 -34.92

Establishing API connection to VIP.

In [3]:
vip = api.VIP()
vip.load_data(data, 'ETF')
Setting up WebSocket connection to: ws://localhost:12345/api
Connection Successful! Initializing session.

In this case, we will use network extractor to generate a network. However, you can also use https://api.virtualitics.com/api-documentation.html?highlight=load_network#virtualitics.api.VIP.load_network to load in edge lists, JSON format, or NetworkX objects. To get an overview of network extractor and supported network formats, see the documentation: https://docs.virtualitics.com/network-graphs/summary-networks/

In [4]:
# The Network Extractor requires categorical columns to be specified as associative columns
# in order to create the edges in the network.
associative_cols = ["Focus", "Region", "Issuer", "Return"]

# We specify "Ticker" as the node column so that each node in the network represents a single ETF
vip.network_extractor("Ticker", associative_columns=associative_cols)

Pull the node metadata from VIP and then merge with the original dataset. Join on Ticker.

In [5]:
node_data = vip.get_dataset()
node_data = node_data.merge(data, left_on="Node ID", right_on="Ticker")

Now we will add the columns of node metadata to the network data loaded into VIP.

In [6]:
for column_name in associative_cols:
    vip.add_column(node_data[column_name])