Related Categories


Suppose there was a sales log dataset, stored in a very familiar tabular format, that contained the username, university, pet type, and spending amount (in $USD) for each sales transaction for your pet store business. This data could be a 3-partite network graph where the usernames are represented by blue nodes, universities are represented by orange nodes, and pet types are represented by green nodes.

Figure 1: This figure shows how an N-partite graph can be found inside seemingly non-network style tabular data. In this case, the ‘University’ column of data is represented by the orange colored nodes, the ‘Username’ column is represented by the blue nodes, and the ‘Pet Type’ column is represented by the green nodes. The ‘Spend’ column is represented in this projection.

There is a business utility (marketing, promotions, distribution, etc.) in understanding the customer segments based on transactional trends. This can be achieved by generating a network graph of the customer nodes. Of course, as we have seen in the previous sections, the idea of N-partite projection can be used to generate the desired network. The final figure shows how we can easily extract a network graph from this very standard-looking sales dataset.

Figure 2: This figure demonstrates how the N-partite projection can start with a standard tabular dataset and ultimately can be represented as a network graph.

VIP implements the concepts behind N-partite projection; the following section maps the abstract idea of N-partite projection to the Network Extractor tool in VIP.

Network Extractor in VIP

After loading a standard tabular dataset from a file, from a database connection, or through the Virtualitics API, users can now leverage the Network Extractor to compute an N-partite projection of the data and, ultimately, visualize a network graph representation of the dataset in VIP.

Figure 3: VIP user-interface after loading a standard tabular dataset.

The network extractor in VIP requires users to specify which column of data to treat as the source of nodes; we refer to this column as the “node column.” Furthermore, users will select a set of categorical columns from the dataset to use as ‘associative columns.’ Each of the associative columns serves as another set of values with which the node set may form a bipartite graph (node and associative value occurring in the same row of data implying an edge should exist between the two). At least one associative column must be provided.

The network extractor can also be accessed using the Python API, as detailed in the notebook at the bottom of our Example Notebooks page.

Figure 4: VIP user interface after running the network extractor with “Ticker” as the ‘node column’ and “Return,” “Region,” “Focus,” “Issuer,” and “Weighting Scheme” as ‘associative columns.’ The user interface also shows the automatically generated interactive insights report.