Virtualitics supports several formats for loading network data: edge list, adjacency matrix, and JSON. Edge list and adjacency matrix are more basic formats that can include edges and their respective weights, while the JSON format enables users to input metadata for each node in addition to the edges and weights. Please see below for examples of each of these formats. These formats must be specified in the Dataset Preview window when loading the dataset so that Virtualitics will recognize them properly.
Edge List (.csv)
While networks are comprised of just nodes and edges, it is possible to encode the data for a network by just listing the edges. This is because the names of the nodes are used to define the edges. As a result, the nodes in the network are just the unique set of nodes referenced in the list of edges. We can also include a column of data to describe the edge weights. When loading in a dataset formatted as an edge list, order matters. Please make sure the first two columns specify the nodes which have an edge between them. The third column can be used as the weight of the edge between the nodes; however, it is not necessary to provide an edge weight column.
Adjacency Matrices (.csv)
Another way of encoding networks is to use a matrix (node IDs as the first row, and an imaginary copy of the node IDs in the same order along the first column). In this format, we have a square matrix where each column name represents the name of a node. There is also a hidden/invisible first column where the column names are repeated in the same order as the column names. The values in the matrix represent the strength of the edges between the nodes described by the column name and the hidden row name. A value of 0 implies that there is no edge between these nodes. Adjacency matrices can be weighted or unweighted.
This format of encoding of network data is very memory inefficient and Virtualitics recommends using one of the other formats (edge list or JSON).
First, we encode the node data under the “Nodes” key. The value will be an array of nested JSON data, where each element of the list describes all the data for one node. The name for the node should be stored in the “Name” key and the value must be a string. The remaining key-value pairs for the given node, must have strings for keys and one of int, float, double, Boolean, or string data type as their value. The additional keys (other than “Name”) will be available as columns of data in VIP, which allow the user to bring in important metadata for each node and provide more context for the network. If there is no value for a given key, simply exclude that key from the nested JSON for the given node. Then, we encode the edge data under the “Edges” key. The value will be an array of nested JSON data, where each element of the list describes the information for one edge.
This format is strongly recommended by Virtualitics for most use cases.
Edge Weight Format
Similarity vs Distance: Similarity is the default when previewing a load of your dataset. However, if your edge weights are values that represent distance, you can specify them as such on the dropdown of Edge Weight Format before loading in the dataset.
Besides an adjacency matrix where a zero represents no edge between the relevant nodes, weighted edge lists and .json files must specify edge weights that are greater than zero.
Our network graphs currently only support undirected graphs. Support for directed graphs will be coming soon in a future release.