In this notebook, we demonstrate how you can use VIP's Network Extractor to create a network dataset from a tabular spreadsheet of Exchange Traded Fund (ETF) data. This dataset has a mixture of numerical and categorical features, which works well when extracting a network. We can then use the network to understand which ETFs are similar to each other, which could help find patterns for stronger or weaker investments.
from virtualitics import api
import pandas as pd
data = pd.read_csv("../../../Virtualitics/SampleData/Exchange Traded Funds.csv")
data.head(3)
Ticker | URL | Volatility_1yr | Expense Ratio | AUM | Trading Volume | DivYield | Region | Issuer | Focus | Weighting Scheme | Return | Return_1yr | Return_3yr | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | FCG | http://finance.yahoo.com/q?s=FCG | 57.95 | 1.778151 | 2.215347 | 6.233085 | 0.755112 | North America | Other | Energy | Equal | Below median | -65.54 | -37.70 |
1 | EWZS | http://finance.yahoo.com/q?s=EWZS | 33.39 | 1.792392 | 1.347135 | 4.694789 | 0.627366 | Latin America | BlackRock | Small Cap | Market Cap | Below median | -42.87 | -36.89 |
2 | BRF | http://finance.yahoo.com/q?s=BRF | 32.02 | 1.778151 | 1.828080 | 4.759547 | 0.575188 | Latin America | Other | Small Cap | Market Cap | Below median | -38.79 | -34.92 |
Establishing API connection to VIP.
vip = api.VIP()
vip.load_data(data, 'ETF')
Setting up WebSocket connection to: ws://localhost:12345/api Connection Successful! Initializing session. Data set loaded with name: 'ETF (3)'
'ETF (3)'
In this case, we will use VIP's Network Extractor to generate a network. However, you can also use https://api.virtualitics.com/api-documentation.html?highlight=load_network#virtualitics.api.VIP.load_network to load in edge lists, JSON format, or NetworkX objects. To get an overview of the Network Extractor and supported network formats, see the documentation: https://docs.virtualitics.com/network-graphs/network-graphs-overview
# The Network Extractor requires categorical columns to be specified as associative columns
# in order to create the edges in the network.
associative_cols = ["Focus", "Region", "Issuer", "Return"]
# We specify "Ticker" as the node column so that each node in the network represents a single ETF
vip.network_extractor("Ticker", associative_columns=associative_cols)
# Now we can investigate this network by using Explainable AI to understand what characterizes each of the communities
vip.explainable_ai("RelativeEdgeDensity", "Louvain Community", associative_cols)
Category | Description |
---|---|
Community 2 | The strongest connections in Community 2 include: 1. <i>Focus</i> = <b>Small Cap</b> for 49% of nodes. 2. <i>Region</i> = <b>Asia-Pacific</b> for 42.2% of nodes. |
Community 1 | The strongest connections in Community 1 include: 1. <i>Focus</i> = <b>Other</b> for 71.9% of nodes. 2. <i>Region</i> = <b>Emerging Markets</b> for 29.2% of nodes. 3. <i>Region</i> = <b>Other</b> for 42.7% of nodes. 4. <i>Return</i> = <b>Below median</b> for 76% of nodes. |
Community 6 | The strongest connections in Community 6 include: 1. <i>Focus</i> = <b>Large Cap</b> for 100% of nodes. 2. <i>Return</i> = <b>Above median</b> for 81% of nodes. |
Community 9 | The strongest connections in Community 9 include: 1. <i>Region</i> = <b>Europe</b> for 100% of nodes. 2. <i>Focus</i> = <b>Total Market</b> for 74.4% of nodes. 3. <i>Issuer</i> = <b>BlackRock</b> for 53.5% of nodes. |
Community 12 | The strongest connections in Community 12 include: 1. <i>Focus</i> = <b>Mid Cap</b> for 84.6% of nodes. |
Community 10 | The strongest connections in Community 10 include: 1. <i>Focus</i> = <b>Consumer Cyclicals</b> for 44.4% of nodes. 2. <i>Focus</i> = <b>Consumer Non-cyclicals</b> for 27.8% of nodes. 3. <i>Focus</i> = <b>Utilities</b> for 27.8% of nodes. |
Community 0 | The strongest connections in Community 0 include: 1. <i>Focus</i> = <b>Energy</b> for 96.7% of nodes. |
Community 4 | The strongest connections in Community 4 include: 1. <i>Focus</i> = <b>Technology</b> for 100% of nodes. |
Community 7 | The strongest connections in Community 7 include: 1. <i>Focus</i> = <b>Financials</b> for 100% of nodes. |
Community 8 | The strongest connections in Community 8 include: 1. <i>Focus</i> = <b>Health Care</b> for 100% of nodes. |
Community 3 | The features found in Community 3 were too infrequent to describe Community 3 using Relative Edge Density. |
Community 11 | The strongest connections in Community 11 include: 1. <i>Focus</i> = <b>Real Estate</b> for 100% of nodes. |
Community 5 | The strongest connections in Community 5 include: 1. <i>Focus</i> = <b>Industrials</b> for 100% of nodes. |
# We can also open Network Insights to learn more about the significance of some communities
# We will first compute additional graph metrics to generate a full list of insights
vip.clustering_coefficient(apply=False)
vip.graph_distance(apply=False)
vip.insights()
Topic | Insight |
---|---|
Community Leaders | There are 13 community leaders. These nodes are leaders because they have maximum connection coverage within their communities. Leaders include: PSCE, HAO, PXR, SPHQ, IEUS, IVOV, REMX, PSJ, PKB, PBS, PBE, PSCF, and TAO |
Largest Community | The largest community in the visible network is Community 2, with 102 member nodes. |
Most Influential | The most influential community in the visible network is Community 1. |
Most Tightly-Knit | The most tightly-knit community in the visible network is Community 0. |
Most Central | The most central community in the visible network is Community 1. |
Most Isolated | The most isolated community in the visible network is Community 3. |