What is a Graph Database ?
Graph databases address one of the biggest use cases, they leverage complex dynamic relationships between data points to generate insightful correlations.
A graph database consists of a several nodes (vertices) with various types of relationships (edges) connecting these nodes. Think of a graph database as a simple white board drawing of all your facebook friends where each friend is a node and relationships (edges) are location, actual relationship.
Image from NOSQL Distilled – A Brief Guide to..
A simpler image
But think about this in scale for twitter or Linkedin. You have a graph a million times larger and more complex.
Some things to keep in mind
- A graph contains nodes and Relationships
- A node contains key-value pairs.
- Relationship are first class citizens & are named & always have a start and end node.
In the above diagram you can see that people are the nodes and the edges are the relationships or the properties of how each of the nodes are linked with every other node.
If you look at the real world applications of graph database they can be used in places where you have a lot of data and would like to build collaborative filtering, a predictive analysis models, pattern analysis, Sentiment analysis. All places where you have several entities and the relationship between these entities are of high value.
Let me give you some examples. Take for instance you are a Telecom company and you have a customer tell you that they are switching to a new provider. You can immediately estimate how many other customers this person is going to also take along with him/ her when they move to a new network with path and pattern analysis.
Take an ecommerce site. You literally have millions of products and millions of customers these days, building a predictive analysis of what products the customer would like to buy when he adds an item to his cart is crucial.
All this data crunching has been done using OLAP and data mining previously. But it is not entirely real time and the Graph databases are primarily real time. They are systems which can do CRUD operations AND expose the graph data model.
It is important to note that the underlying data is stored in the native graph model and is optimized and designed to be utilized by a graph engine.
Overview of the various players in the Graph database Space.
The above picture shows the the variety of technologies that are powerful graph processing engines to the grap storage solutions and the combination of both.
Where do RDBMS fall short in comparison to Graph DBs ?
Relational Databases were primarily designed to codify paper forms and tabular structures and have over time been extremely widely used and optimized. But interestingly relational Databases struggle when attempting to model ad-hoc and exceptional relationships in the real world.
Relationships do exist however there is an illusion and an ambiguity of how they are related and connected. As data grows it becomes inherently complex to do joins and extrapolate connections that may exist between various data points.
Why so low in acceptance ?
The above description explains why graph databases are so important and the various business cases that we can use them in. However over time we have seen that graph databases have not entirely taken off and there are several reasons for that. Specialized Siloed systems, Non-scalable big data, Expansive skill set required, Fragmented Data Analytics, higher Memory and processing power in required. It also is important to note that it takes a some time to get real value out of this system as it takes time to refine and tune.
Several new products that can be done in SQL
It is also important to note that we now have several solution providers offering a wrapper which would use SQL and is easier to implement and maintain. The skill set problem could be solved this way.