IBM System G Native Graph Store Overview

Further documentation of Native Store:
(1) Software Documentation



Native Store

IBM System G Native Store provides an efficient graph data store solution that can handle various simple graphs, property graphs, and RDF-like graphs, in terms of storage, analytics, and visualization.

Native Store Solution
Fig 1. System G Native Store Solution

Native store not only offers persistent graph storage, but also sequential/concurrent/distributed graph runtimes, a set of C++ graph programming APIs, a CLI command set (gShell), a socket client, a socket client GUI, and some visualization toolkit.

db_arch
Fig 2. IBM System G Native Store Overview


Native Store Introduction

IBM System G Native Store provides several in-memory and on-disk storages for property graphs. This web primarily introduces System G Native Store, a C++ template based high performance implementation of graph stores. The Native Store is designed according to the data access patterns in various graph applications and the architecture of commodity processors, aiming at offering highly efficient graph data access for Big Data analytics. Native Store supports sequenetial, concurrrent, and distributed graph runtime libraries in IBM System G middleware, and provides

  • C++ APIs for developers
  • a command based gShell for high leqvel users
  • a JNI layer for translating the native APIs into TinkerPop APIs
  • a SPARQL layers based on JENA through the TinkerPop layers for RDF queries
For users of different levels, the gShell further provides highly intuitive database operations, supporting
  • local interactive mode in CLI
  • socket-based remote operation mode
  • a Web-based interactive mode
  • local Qt GUI frontend.


roadmap
Fig 3. IBM System G Native Store Roadmap

There are couple of reasons explaining the advantages of IBM System G Native Store:

  • First, native store is designed from scratch for highly efficient graph computing. Its data structures are optimized at multiple layers (disk storage, graph-specific file caching, cpu-cache amenable in-memory graph data structure, optimized scheduler for concurrent graph access, etc.) towards graph computing characteristics, such as the irregular data access patterns. For example, our cpu-cache amenable in-memory graph data structure improves cache hit rate. The L1D cache hit rate is pretty high, which is not common for general graph representations.
  • Second, the C/C++ based kernel of the graph store is easy to incorporate with various hardware innovations. Compared to many open source competitors, our solution offers more opportunties for performance optimization. In constrast, the Java-based packages have challengs to be optimized when beyond the JVM layer.
  • Third, unlike some open source graph databases where only a graph front-end is provided, but a non-graph back-end is used, we organize data always using graph across layers. Therefore, at each layer, we can optimize operations towards graph computing. It is straightforward to see that, if the back-end is not a graph, some graph behaviors may not be well handled. For example, some open source graph database's back-end heavily relies on indices. Although indexing helps, it can also introduce overheads when the graph is very dynamic, where many indices must be updates at runtime. Native graph store, such as our native store and Neo4j, have back-end as a graph, where graph data access and traversal is straightforward.
  • At last, for implementing such cross-layer optimization for graphs, we build our team by having researchers with backgrounds in graph analytics, systems, databases, POWER architecture, high performance computing, compilers. The multi-disciplineary team ensures the success of the solution.



Examples of Native Store REST APIs

Each gShell command can be converted into a REST-like APIs. Here are show a few exampels. More details can be found at this webpage.
  • List all stores and their types. This command will list all existing stores (opened or not opened) and their respective graph type (directed, undirected, pred_directed). Note that this command will not load any store if they are not opened already.
    http://yourserver/nsREST?cmd=list_all
  • Create a store. Assuming we want to create a graph store called "", and we want use it to store an undirected graph. Note that a graph is not necessarily to be connected in gShell; that is, a graph store can maintain a set of small graphs, but all of them must be of the same type (directed, undirected, or pred_directed). The command for creating such a store is:
    http://yourserver/nsREST?cmd=create&storename=<store_name>&graphtype=<type> 
  • To get the number of vetices in a graph, we use the following command.
     http://yourserver/nsREST?storename=<store_name>&cmd=get_num_vertices
  • To get the number of neighbors of a vertex.
    http://yourserver/nsREST?storename=<store_name>&cmd=get_num_neighbors&vertex_id=<vertex_id>
  • To query all neighbors of a vertex, we use the following command. We just need to provide a vertex ID.
    http://yourserver/nsREST?storename=<store_name>&cmd=query_neighbors&vertex_id=<vertex_id>
  • Find the vertex with the maximum node degree. If the condition i and val are given, it finds the vertex only from those satisfying the condition, i.e., the i-th property of the vertex is equal to val. It also finds the vertices with the top #n number of degrees.
    http://yourserver/nsREST?storename=kv11&cmd=find_vertex_max_degre
  • Find n vertices randomly from a graph. This command is for users to get some vertices, so that they can use such vertices as start points for certain analytics. is a number, say 10.
     http://yourserver/nsREST?storename=<store_name>&cmd=find_random_vertices&n=<#n>
  • A user controled Breadth First Search (BFS) can be invoked by the following commands. is an arbitrary vertex in the graph stored in , and #hops shows the maximum allowed BFS levels. gives the maximum number of vertices to traverse at each BFS level. This is for visualization purpose, where we do not want to visualize all the edges for dense vertices. The output is in JSON format.
    http://yourserver/nsREST?storename=<store_name>&cmd=bfs_visual&root=<vertex_id>&hops=<#hops>&max_breadth=<#max_breadth>



Native Store Demo Online (under construction for IBM Cloud)

Click any one of the following images to access an online demonstration of native store, where a user can upload graph data in various formats, such as the CSV format, and then perform queries. The query result can be visulized, if the command has a JSON output.

The online demo webs are actually built on top of gShell. An easy way to try out the demo is to invoke the collaborative fitlering in gShell. We can use the dataset kv11test3, a pre-loaded graph of IBM KnowledgeView dataset in CSV format. Then, in the query text box, the following command can be typed in:


centroid_visual Q727718S84162V06 10 10 json

This is an instance of the gShell command:


centroid_visual \ \<#rank1\> \<#rank2\> [json]

The first argument for this command is also the queried vertex ID, but it performs a 2 hops colFilter and got the top \<#rank1\> nodes, and then for each of them perform 2 hops again and get the top 10 nodes. The results are of the top #rank from the 1st colFilter and the top \<#rank2\> of the remaining colFilter are aggregated to return. The result can be formatted into JSON if the optional argument is specified.


demo  
demo_lite
Fig 5. IBM System G Native Store Online Demo (Click to invoke)

Then, a user can click/choose Text to see the JSON output, or click/choose Visualize to visualize the results.


Note: the gShell version in the online demo is older than the latest one that you can downloaded. The documentation describes the latest version of the Native Store, so some command may not work in the demo.




TinkerPop over Native Store

TinkerPop Blueprints

IBM System G has a JNI layer to translate the Native Store graph APIs into the TinkerPop APIs. Therefore, JAVA graph applications built on top of the TinkerPop Blueprint can be ported onto the IBM System G Native Store. Therefore, various Open Source tools can be integrated into the IBM System G.

IBM System G supports Java clients through an in-process JNI layer that maps the concepts and methods of the multi-property C++ layer to Java static methods. This has allowed the System G team to leverage existing open source Java-based code bases to implement additional graph features. As a result, IBM System G users have the ability to access their graphs through Groovy, Gremlin and SPARQL. In most cases Java clients do not program to the JNI interface but instead program to a TinkerPop Blueprints layer built upon the JNI methods.

System G provides TinkerPop Blueprints interfaces to both it's high-performance C++ implementations and its HBase-based GBase graphs. These gives customers the ability to use the TinkerPop suite against their NativeStore and GBase graphs. Clients have found the most valuable of these to be Gremlin. We have high hopes for TinkerPop Rexster, but so far we've found the Rexster (2.3, 2.4) public implementations to be unstable and have not found time to diagnose and patch the problems. The System-G team itself has found the TinkerPop test suite to be the most valuable portion of the TinkerPop suite and has found it handy for rapidly validating portions of IBM System G.

Trying it out

If you'd like to try out TinkerPop over NativeStore, you can simply use maven to include it in your project on Linux
  <properties>
    <nexus.url>http://vcqa-1.pok.ibm.com:8081/nexus</nexus.url>
    <nexus.snapshots>${nexus.url}/content/repositories/snapshots/</nexus.snapshots>
    <nexus.internal>${nexus.url}/content/repositories/releases/</nexus.internal>
    <nexus.forwala>${nexus.url}/content/repositories/thirdparty/</nexus.forwala>
  </properties>
  <repositories>
        <repository>
          <id>nexus_internal2</id>
          <url>${nexus.internal}</url>
          <releases><enabled>true</enabled></releases>
          <snapshots><enabled>false</enabled></snapshots>
        </repository>
        <repository>
          <id>nexus_forwala</id>
          <url>${nexus.forwala}</url>
          <releases><enabled>true</enabled></releases>
          <snapshots><enabled>false</enabled></snapshots>
        </repository>
        <repository>
          <id>nexus_snapshots2</id>
          <url>${nexus.snapshots}</url>
          <releases><enabled>false</enabled></releases>
          <snapshots><enabled>true</enabled></snapshots>
        </repository>
  </repositories>
  <dependencies>
  ...
    <dependency>
        <groupId>com.ibm.research.systemg</groupId>
        <artifactId>NativeBlueprints</artifactId>
        <version>2.4.0-SNAPSHOT</version>
    </dependency>
     ...
  </dependencies>
      
or you might find it easier to simply pull down some of our sample code from git and modify it to suit your needs :

      git clone git://git01.pok.ibm.com/SGAPISamples       
      
See git01.pok.ibm.com for more information about our git repository.


Gremlin over Native Store

marko$ ./bin/gremlin.sh \,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin>

Since Tinkerpop is supported, Gremling is running. Gremlin is a domain specific language for traversing graphs. By using Gremlin, it is possible make use of a REPL (command line/console) to interactively traverse a graph. Gremlin was designed to work with a type of graph called a property graph. By using Gremlin, it is possible make use of a REPL (command line/console) to interactively traverse a graph. Since Native Store provides Tinkerpop/Blueprints interface via JNI, Gremlin is running on Native Store. In the follow webpage, we provides instructions on installing Native Store Gremlin and using the Gremlin Console. Getting Started with Gremlin (/db/doc/gremlin.html)


SPARQL over Native Store

JENA

A JENA based SPARQL query engine is installed on top of the System G Native Store. The queries are translated into graph APIs in TinkerPop, and then translated into the Native Store APIs. Thus, Native Store opens an opportunity to leverage several Open Source graph solutions for some rapid application developement. System G provides SPARQL support. The initial implementation of this was based on TinkerPop's SailGraph and GraphSail classes. Current work is based on Apache Jena and is focusing on improving the performance of that implementation.





IBM System G Native Store paper published on IEEE BigData conference 2014.