## Graph Analytics Overview

There are three sets of tools for IBM System G Graph Analytics:

(1) Network Topological Analysis

(2) Graph Matching and Search

(3) Graph Path and Flow

## Graph Analytics Toolkit

IBM System G Graph Analytics Toolkit is a comprehensive set of graph analytics libraries developed on top of IBM System G Graph
Databases and Middleware to enable
graph analyses for various IBM System G Solutions. It is capable of handling different types of graphs, including:

(1) Few, very large graphs, e.g. social graphs, Internet of Things

(2) Many, many small graphs, e.g. protein structures, graphs in the healthcare domain

(3) Large, semantic graph, e.g. Semantic Web, RDF graphs

(4) Graphical models, e.g. Bayesian Network, Markov Network

The toolkit consists of three main sets of tools: Network Topological Analysis tools, Graph Matching and Search tools, and Graph Path and Flow tools. Some of these tools are C++ tools built on top of IBM System G Native Store, while the others are Java tools using GBase on top of Hadoop HBase and HDFS. The table below provides a break-down of the tools:

Analytic Tool | C++ (Native Store) | Java (GBase) |

Network Topological Analysis | ||

Degree Centrality | ||

Closeness Centrality | ||

Betweenness Centrality | ||

PageRank | ||

Connected Component | ||

K-Core Decomposition | ||

Triangle Count | ||

Clustering Coefficient | ||

Egonet | ||

K-Neighborhood | ||

Graph Matching and Search | ||

K-Nearest Neighbors | ||

Induced Subgraph | ||

Graph Matching | ||

Graph Search & Recommendation | ||

Graph Sampling | ||

Collaborative Filtering | ||

Graph Path and Flow | ||

Shortest Paths | ||

Top K-Shortest Paths | ||

Network Information Flow |

Available now | Work in progress |

Computational characteristics of selected analytics are listed in the table below:

Analytic |
Algorithm Type |
Graph Size |
Run Time Characteristics |
Parallelization |
Memory Requirements |
Graph Format |
Data Location |

K-core | Exact | Tested against > 200M edge graph | Disk IO bound due to HBase (chart) | Distributed parallelized | O(|V|+|E| mod partitions) | Edge list, provide format conversion | Balanced partition |

K-Neighborhood | Exact | Scale free | Disk IO bound due to HBase (chart) | Distributed | Scale free | Edge list, provide format conversion | Balanced partition |

K-Shortest Paths | Exact | Scale free | Disk IO bound due to HBase (chart) | Distributed parallelized | O((|E|/|V|)^k) | Edge list, provide format conversion | Balanced partition |

Connected Component | Exact | Tested against > 70M edge graph | Disk IO bound due to HBase | Distributed parallelized | O(|V|+|E| mod partitions) | Edge list, provide format conversion | Balanced partition |

PageRank | Exact | Tested against > 70M edge graph | Disk IO bound due to HBase (chart) | Distributed parallelized | O(|V|+|E| mod partitions) | Edge list, provide format conversion | Balanced partition |

Graph Search/Recommendation | Exact | Scale free | Mixed IO and CPU workload for index building (chart) | Multithreading | O((|E|/|V|)^3) | Adjacency list, provide format conversion | Data in indices |

Probabilistic Inference in Bayesian Networks | Approximation | Constrained by shared physical memory | O(Nkr^w/P) ^{1} (chart) |
Multithreading | O(Nkr^w) | Edge list, provide format conversion | Data in shared memory |

XRIME Scalable Community Detection | Approximation | Tested against > 5M edge graph | Initially CPU intensive (chart) | Distributed parallelized | Scale free | Adjacency list, provide format conversion | Random partition |

Snazzy Community Detection | Exact | Tested against > 1M edge graph | Mixed CPU and IO workload | Mostly parallelized | Constrained by memory | Adjacency list, provide format conversion | Data in memory |

Dynamic Subgraph Matching | Exact | Tested against > 150K edge graph. Change up to 30% within 10 sec | Fast indices updating. Indices update time for 30% graph change < 10 second (chart) | Not parallelized | 200MB for 150K edge graph | Edge list, provide format conversion | Data in memory |

Incremental Streaming K-Core | Exact | Tested against > 16M edge graph | Update time for each insertion/deletion is O(|E|) | Not parallelized | O(|V|+|E|) | Edge streams, provide format conversion | Centralized and in memory |

Streaming Graph Clustering | Approximation | Tested against 400K updates/sec | Update time for each insertion/deletion is O(M^2) ^{2} |
Not parallelized | O(|V|+|E|) | Edge streams, provide format conversion | Centralized and in memory |

Streaming Graph Clustering Ceofficient | Exact | Tested against > 1.8B edge graph, up to 24M updates/sec | Update time for each insertion/deletion is O(Degree) chart | Not parallelized | O(|V|+|E|) | Edge streams, provide format conversion | Centralized and in memory |

N is the number of nodes, k is the node degree, r is the number of states, w is the number of node in-degree, P is the number of threads, M is the maximum cluster size.