Virtualitics has implemented several fundamental network analysis tools into the analytics suite so that beginner and expert network analysts alike can augment their analysis. These tools include Compute Structure, Clustering Coefficient, PageRank, Graph Distance and Shortest Path.
When performing analysis of Network Graphs, it is often critical to compute different metrics to evaluate your network. These allow you to streamline the process of data exploration and picking out Network Insights.
For example, you may need to calculate how many mutations away one gene is from another, in which case you could calculate the Shortest Path to see how that mutation could occur. You may also be interested in identifying hubs that are critical to your supply chain, which could be identified by calculating Graph Distance metrics.
Network Analysis Tools
Right-click anywhere on the plot and hover over Network Analysis Tools to see the following options:
Compute Structure | Recalculates the 3D spatialization after modifying which nodes are included in the network. |
Clustering Coefficient | Calculates the connectedness, or density, of the network and maps onto color |
PageRank | Measures the influence of the nodes. When computed, this metric is automatically mapped to Color |
Graph Distance | Computes several measures of centrality as well as eccentricity to show properties of the network. These computations drive some of the Network Insights. |
Shortest Path (access by right-clicking on a point) |
Measures the path from one node to another by considering the path length as well as the edge weights. |
Click on each of the Network Analysis tools below to learn more.
Compute Structure
Compute Structure recalculates the 3D spatialization after modifying which nodes are included in the network.
Running Compute Structure on a Network Graph
- Right-click anywhere in the plot (not on a node) and hover over Network Analysis Tools, then select Compute Structure.
Clustering Coefficient
Clustering Coefficient is a network analysis algorithm which describes how closely nodes in a network tend to group or “cluster” together.
Clustering Coefficients provide details relating to the interconnectedness of subcommunities in a network. This metric has proven to be effective for understanding function-structure associations in the brain.
Consider an example involving a Network Graph constructed from an eCommerce sample dataset. Using the Clustering Coefficient, we can see which groups of users have the most unique characteristics in the network.
Running Clustering Coefficient on a Network Graph
- Right-click anywhere in the plot (not on a node) and hover over Network Analysis Tools, then select Clustering Coefficient.
The Clustering Coefficient will be calculated and automatically applied to the Color dimension.
Additional Details
In network analysis, it can be useful to know the edge density of portions of the network, especially when a clique is formed (a portion of a network where all the nodes are connected to each other).
Clustering Coefficient is a metric that measures how close a node is to forming a clique with its neighbors. The values for the Clustering Coefficient range from 0 to 1. A node has a Clustering Coefficient of 1 when it forms a clique with its neighbors, while a node has a Clustering Coefficient of 0 when there are no edges among the node’s neighbors.
PageRank
PageRank is a network analysis algorithm used to provide a ranking of nodes in terms of importance or influence across a network graph.
This algorithm empowers users to understand which nodes in their network are the most popular. Consider this example involving an Airport Traffic dataset. PageRank ranks all nodes in the network based on their popularity with one another, enabling the user to immediately find the biggest hubs across North America.
The output of the PageRank algorithm is a numeric value between 0 and 1 for each node. The algorithm will add a column to the existing dataset and that column can be mapped to any dimension you desire.
Using PageRank on a Network Graph
- Right-click anywhere in the plot (not on a node) and hover over Network Analysis Tools, then select PageRank.
The PageRank will be calculated and automatically applied to the Color dimension.
Additional Details
PageRank is an algorithm that was originally developed by the founders of Google as a way of ranking web pages in terms of importance and influence across the internet.
The PageRank algorithm works iteratively. Initially, all nodes in the network are assigned an equal amount of PageRank. In each iteration, each node will equally distribute its current PageRank with its neighbors (if edges are weighted, more PageRank will be shared with neighbors that share a larger edge weight). After several iterations, this provides a metric where nodes with high PageRank are generally connected to other nodes with high PageRank.
Network analytics metrics tend to follow a power distribution, which is why Virtualitics Explore Desktop shows color with a gradient and automatically normalizes the Color dimension to get a nice spread of colors.
Graph Distance
Graph Distance in Virtualitics Explore Desktop represents 3 different algorithms which calculate the centrality of networks. These centrality metrics define the importance of nodes based on their position in the network.
Centrality metrics solve some of the most important problems in network analysis. Some examples include:
- Identifying the most influential person(s) in a social network
- Key infrastructure nodes in the Internet
- Superspreaders of a virus
Graph Distance will generate three network centrality metrics that are based on Shortest Paths of all the nodes in the network:
- Betweenness Centrality: a metric that defines how often a node is found on a path between two nodes
- Closeness Centrality: a metric that defines how close a node is to other nodes
- Eccentricity: a measure that defines how far or “eccentric” a node is from other nodes
Using Graph Distance
- Right-click anywhere in the plot (not on a node) and hover over Network Analysis Tools, then select Graph Distance.
The Betweenness Centrality, Closeness Centrality and Eccentricity will all be calculated.
The Betweenness Centrality will automatically applied to the Color dimension.
The Closeness Centrality and Eccentricity will be added to the Features Panel under Graphs.
Additional Details
Closeness Centrality is a metric computed for each node that represents the average path length from the given node to all other nodes. The value will range between 0 and 1. A value of 1 implies that this node was the closest (on average) to all other nodes along paths in the network. Nodes with high Closeness Centrality will typically be found near the center of the network visualization.
Betweenness Centrality is a metric computed for each node that represents the number of computed shortest paths that go through a given node. The value will range between 0 and 1. If a node has a betweenness centrality of 1, it means that the node lies along more shortest paths through the network than any other node. Nodes with high betweenness centrality frequently fall between communities and form bridges between major components of the network.
Eccentricity is a metric computed for each node that represents, of all the shortest paths computed for a node, the largest distance from that node to any other node. The value is always a positive value above 1. The nodes with large eccentricity values will be positioned along the periphery of the network visualization.
Shortest Path
Shortest Path is an algorithm that allows users to calculate the distance from one point to another in a network graph.
Calculating the shortest path between two nodes in a network has many applications. Some examples include analyzing the best origins and destinations for shipping route optimization, or determining the most efficient path for routing communications through a computer network. The shortest path calculation is also used in computing the Graph Distance metrics.
Calculating a Shortest Path on a Network Graph
- Right-click on the first node you would like to use and select Shortest Path. This node will be your origin.
- Click on the second node you would like to use, which will be your destination.
The Shortest Path Panel will update with the following information:
- Path Length: the number of edges in the path
- Weight: the sum of edge weights along the identified path
- Node Similarity: this uses the Jaccard similarity metric as a representation of node similarity for the path endpoints
Click on another node to compute a different path.
Network Insights
Network Insights is a tool that enables users to better interpret the significance of communities generated from their network.
Network Insights enables users to distinguish trends in holistic network behavior. Consider this example involving the Airport Traffic sample dataset. Examining the Community Leaders, we can quickly identify some of the main flight hubs across different regions of the US.
Run Insights on a Network Graph
- Clicking the Insights icon ( ) in the top toolbar or select Insights from the Data Analytics menu.
The Insight cards should automatically populate with information identifying specific communities of interest. If not, click Refresh at the bottom of the panel.
Some cards may not be expanded the first time you open the Insights panel in a new network. This is because these Insights rely on additional Network Graph metrics that are not automatically computed. You can generate these Insights by simply clicking on the respective cards.
Selecting a card will highlight the community corresponding to that Insight.
Additional Details
Community Leaders
Community leaders represent nodes with the highest level of connectivity (most coverage) within their community. Since the community leaders are so well connected within their community, analyzing those nodes may help you interpret what the overall community represents relative to the rest of the network.
Largest Community
We highlight the largest community and show the number of nodes in this community. Frequently in network analysis, there is 1 major community with several satellite communities that sit just outside the main community. This insight is useful for quickly contextualizing and orienting yourself around the network.
Most Influential Community
In order to determine the most influential community, a community graph is created where each node represents a community from the original network graph. The edges in the community graph are representative of the collective edge weights between each pair of communities in the original network graph. The most influential community is determined to be the community with the highest PageRank value in the community graph.
Most Tightly-Knit Community
Virtualitics Explore Desktop determines the most tightly-knit community by computing the Clustering Coefficient for the network and identifying the community with the highest median clustering coefficient.
The nodes in the most tightly-knit community are highly connected to each other and will have a high clustering coefficient. However, this metric does not take into account any connections to other communities.
Most Central Community
The most central community is determined by computing the Graph Distance metrics. For each node we determine the average of the betweenness and closeness centrality metrics; we can refer to this new metric as the overall centrality measure. Then we determine which community has the highest median value for our overall centrality measure.
The most central community in the network plays an important role in that its nodes are close (in terms of graph distance) to other nodes and in that the most efficient paths through the network tend to pass through these nodes.
Most Isolated Community
The most isolated community is determined by computing the Graph Distance metrics. Specifically, the community with the lowest median Closeness Centrality is the most isolated community.
The most isolated community is weakly connected to the other communities in the network. This implies that the nodes in the community are niche or not alike the other nodes and communities of the network in some way.
Previous Article |