Graph Database Overview

IBM System G Graph Databases has several versions:
(1) Native Store
(2) GBase on top of Hadoop HBase and HDFS



Introduction of Graph Database

A graph database is a database that uses graph structures with vertices, edges, and properties to represent and store data. A graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent elements and no index lookups are necessary. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases.

Compared with relational databases, graph databases are often faster for associative data sets, and map more directly to the structure of object-oriented applications. They can scale more naturally to large data sets as they do not typically require expensive join operations. As they depend less on a rigid schema, they are more suitable to manage ad hoc and changing data with evolving schemas. Conversely, relational databases are typically faster at performing the same operation on large numbers of data elements. Graph database is a powerful tool for graph-like queries, for example, computing the shortest path between two vertices in the graph. Other graph-like queries can be performed over a graph database in a natural way (for example, graph's diameter computations or community detection).

Trends of search interest on Graph Database and Relational Database, realitme from Google (Google Trend normalizes Y-axis to the highest value in a chart to 100%):
Comparison of relative amounts of searches on Relational Database and Graph Database:



Graph Database Terminologies

Terminology Explanation
Graph A graph is a representation of a set of objects where some pairs of objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges.
Database Database management systems (DBMSs) are specially designed software applications that interact with the user, other applications, and the database itself to capture and analyze data. A general-purpose DBMS is a software system designed to allow the definition, creation, querying, update, and administration of databases.
Graph database
A graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data, which provides index-free adjacency. This means that every element contains a direct pointer to its adjacent elements and no index lookups are necessary.
Generic Graph database
A general graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases. Compared with relational databases, it is often faster for associative data sets, and map more directly to the structure of object-oriented applications. They can scale more naturally to large data sets as they do not typically require expensive join operations.
Property Graph
Graphs take many forms but the most ubiquitous is the Property Graph which has the following additional characteristics to a simple graph: 1) Nodes have a unique identifier. 2) Nodes have key-value pair properties. 3) Relationships have a unique identifier. 4)Relationships have a type. For instance, Tom likes Cindy or Jack IS_THE_BOSS_OF Bob. 5) Relationships are directed i.e. they have an orientation. For example, if Tom follows Jack on Twitter, the relationship would be defined as pointing from Tom to Jack. 6)Relationships have key-value properties.
Triplestore
A triplestore is a purpose-built database for the storage and retrieval of triples, a triple being a data entity composed of subject-predicate-object, like "Bob is 35" or "Bob knows Fred". Much like a relational database, one stores information in a triplestore and retrieves it via a query language. Unlike a relational database, a triplestore is optimized for the storage and retrieval of triples.
RDF
The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications [1] originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax notations and data serialization formats. It is also used in knowledge management applications. RDF data is often persisted in relational database or native representations also called Triplestores, or Quad stores if context (i.e. the named graph) is also persisted for each RDF triple.
RDFS
RDF Schema (Resource Description Framework Schema) is a set of classes with certain properties using the RDF extensible knowledge representation language, providing basic elements for the description of ontologies, otherwise called RDF vocabularies, intended to structure RDF resources. These resources can be saved in a triplestore to reach them with the query language SPARQL.
SPARQL
SPARQL is an RDF query language, that is, a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework format, made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web.
Tinkerpop/Blueprints
Blueprints is a collection of interfaces, implementations, and test suites for the property graph data model. Blueprints is analogous to the JDBC, but for graph databases.