Graph Analytics Overview

Further Detail of IBM System G Graph Analytics:
Network Topological Analysis
Graph Matching and Search
Graph Path and Flow

Further Detail of IBM System G Graph Analytics Software:
Documentation




Graph Analytics Toolkit

IBM System G Graph Analytics Toolkit is a comprehensive set of graph analytics libraries developed on top of IBM System G Graph Databases and Middleware to enable graph analyses for various IBM System G Solutions. It is capable of handling different types of graphs, including:
(1) Few, very large graphs, e.g. social graphs, Internet of Things
(2) Many, many small graphs, e.g. protein structures, graphs in the healthcare domain
(3) Large, sementic graph, e.g. Semantic Web, RDF graphs
(4) Graphical models, e.g. Bayesian Network, Markov Network

The toolkit consists of three main sets of tools: Network Topological Analysis tools, Graph Matching and Search tools, and Graph Path and Flow tools. Some of these tools are C++ tools built on top of IBM System G Native Store, while the others are Java tools using GBase on top of Hadoop HBase and HDFS. The table below provides a break-down of the tools:

Analytic ToolC++ (Native Store)Java (GBase)
Network Topological Analysis
Degree Centrality
Closeness Centrality
Betweenness Centrality
PageRank
Connected Component
K-Core Decomposition
Triangle Count
Clustering Coefficient
Egonet
K-Neighborhood
Graph Matching and Search
K-Nearest Neighbors
Induced Subgraph
Graph Matching
Graph Search & Recommendation
Graph Sampling
Collaborative Filtering
Graph Path and Flow
Shortest Paths
Top K-Shortest Paths
Network Information Flow

Available now Work in progress

Click here for software documentation of the above graph analytics.

Computational characteristics of selected analytics are listed in the table below:

Analytic Algorithm Type Graph Size Run Time Characteristics Parallelization Memory Requirements Graph Format Data Location
K-core Exact Tested against > 200M edge graph Disk IO bound due to HBase (chart) Distributed parallelized O(|V|+|E| mod partitions) Edge list, provide format conversion Balanced partition
K-Neighborhood Exact Scale free Disk IO bound due to HBase (chart) Distributed Scale free Edge list, provide format conversion Balanced partition
K-Shortest Paths Exact Scale free Disk IO bound due to HBase (chart) Distributed parallelized O((|E|/|V|)^k) Edge list, provide format conversion Balanced partition
Connected Component Exact Tested against > 70M edge graph Disk IO bound due to HBase Distributed parallelized O(|V|+|E| mod partitions) Edge list, provide format conversion Balanced partition
PageRank Exact Tested against > 70M edge graph Disk IO bound due to HBase (chart) Distributed parallelized O(|V|+|E| mod partitions) Edge list, provide format conversion Balanced partition
Graph Search/Recommendation Exact Scale free Mixed IO and CPU workload for index building (chart) Multithreading O((|E|/|V|)^3) Adjacency list, provide format conversion Data in indicies
Probabilistic Inference in Bayesian Networks Approximation Constrained by shared physical memory O(Nkr^w/P) 1 (chart) Multithreading O(Nkr^w) Edge list, provide format conversion Data in shared memory
XRIME Scalable Community Detection Approximation Tested against > 5M edge graph Initially CPU intensive (chart) Distributed parallelized Scale free Adjacency list, provide format conversion Random partition
Snazzy Community Detection Exact Tested against > 1M edge graph Mixed CPU and IO workload Mostly parallelized Constrained by memory Adjacency list, provide format conversion Data in memory
Dynamic Subgraph Matching Exact Tested against > 150K edge graph. Change up to 30% within 10 sec Fast indices updating. Indices update time for 30% graph change < 10 second (chart) Not parallelized 200MB for 150K edge graph Edge list, provide format conversion Data in memory
Incremental Streaming K-Core Exact Tested against > 16M edge graph Update time for each insertion/deletion is O(|E|) Not parallelized O(|V|+|E|) Edge streams, provide format conversion Centralized and in memory
Streaming Graph Clustering Approximation Tested against 400K updates/sec Update time for each insertion/deletion is O(M^2) 2 Not parallelized O(|V|+|E|) Edge streams, provide format conversion Centralized and in memory
Streaming Graph Clustering Ceofficient Exact Tested against > 1.8B edge graph, up to 24M updates/sec Update time for each insertion/deletion is O(Degree) chart Not parallelized O(|V|+|E|) Edge streams, provide format conversion Centralized and in memory

N is the number of nodes, k is the node degree, r is the number of states, w is the number of node in-degree, P is the number of threads, M is the maximum cluster size.