Network Topological Analysis Tools

Network Topological Analysis Tools perform analyses of the topological characteristics of network graphs, including:
(1) Computing node centralities to understand the role and importance of any node in a graph
(2) Detecting communities through connected components, k-core decomposition, triangle counting and clustering coefficient
(3) Running graph topology based target queries such as extracting egonet or k-hop neighborhood




Centralities

Four common centrality measures, namely Degree, Closeness, Betweenness, and PageRank (a variant of eigenvector centrality) are supported.

The Degree of a graph node is the number of edges incident to the node. The distribution of the degrees of all the nodes in a graph is a commonly studied network characteristic.

degree distribution example: power-law network

degree distribution example: small-world network

The Closeness of a graph node is the inverse of the sum of its distances to all other nodes. Alternatively, it can be computed as the sum of the inverse of the node's distance to every other node, as proposed by Opsahl for networks with disconnected components.

closeness centrality

The Betweenness of a graph node quantifies the number of times the node acts as a bridge along the shortest path between two other nodes.

betweenness centrality

The PageRank of a node measures its influence in a network graph.

pagerank

Support provided by IBM System G Graph Databases enables scalable and efficient implementations of centrality computation. For instance, GBase implementations leverage HBase Coprocessors to minimize redundant data shuffling during computation. Performance evaluation on real-world rich graph datasets demonstrated significant improvement over traditional Hadoop implementation.

comparison between GBase and Hadoop performance for pagerank

Community Detection

Finding communities and studying their properties are essential to social network analysis. Network Topological Analysis Tools provide support for Connected Components, K-Core, Triangles and Clustering Coefficient, four network characteristics commonly used for community detection.

connected components

k-core

clustering coefficient

As illustrated below, scalability is one of the main advantages of IBM System G Graph Analytics.

identifing communities in envolving graphs

Egonet and K-Neighborhood

It is common for social network based applications to analyze nodes in the egonet or k-hop neighborhood (a.k.a. k-neighborhood) of important/target nodes.

egonet

The k-neighborhood implementation in GBase Analytics is specifically developed to take advantage of HBase Coprocessors to achieve superior performance in parallel processing.