Let's imagine you want to see how Fred is connected to Steve, their network looks like this:
[Fred] <-knows-> [Bob]
[Bob] <-isMarriedTo-> [Sally]
[Bob] <-knows-> [Alice]
[Alice] <-workedWith-> [John]
[John] <-wentToSchoolWith-> [Sandra]
[Sandra] <-knows-> [Steve]
Diagram: http://yuml.me/6ff3074eA "traditional" database like MySQL or Mongo makes this kind of query prohibitively expensive and complicated, as it must perform a new join for every connected person in the user's graph.
Graph databases come into their own because they are designed specifically for efficient traversal of these connecting edges. They typically do this by storing "pointers" on each vertex to its connected edges, so while a normal RDBMS requires something like a hash table lookup to resolve a join, a graph database can simply "jump" to the relevant record via a pointer. This means that things like Dijkstra's algorithm [0] can be implemented efficiently.
However, "traditional" graph databases like Neo4j require everything to be structured in terms of vertices and edges. This is often quite inconvenient, so Multi Model databases like ArangoDB integrate this graph approach with a document store as well, the idea being that if you can keep everything in the same db your app gets a lot simpler, you regain things like ACIDity that you'd normally lose by using 2 separate dbs, and performance should be a lot better too.
Check out http://neo4j.com/ and live examples using it http://gist.neo4j.org/
a "native graph database" is one that is actually designed for the task.
There are different approaches, which are used in other products and which can also work well. For example, you can restrict the database engine to a pure key/value store and add different personalities to it. Or you have a client which implements a common query language for different products.
ArangoDB is OrientDB done right, but it's a lot younger.
If you're considering using either, you owe it to yourself to investigate whether postgres's Common Table Expressions [0] can do what you want instead. If you can stick with something more mature like postgres, then you'll be saving yourself a lot of pain.
[0] http://www.postgresql.org/docs/9.1/static/queries-with.html
How are you backing this ? I am sure Luca from OrientDB will have some comments.
>>The uncompressed JSON data for the vertices need around 600 MB and the uncompressed JSON data for the edges requires around 1.832 GB.
So why use a 60GB RAM machine for so little data?
Can we get some raw numbers instead of %?
I agree never trust a benchmark. It really all depends on your use case. If you have ideas for improvements, we would love to hear about them. Also if you have any idea how to improve the mongodb or neo4j queries, please check-out github and let us know.
For the technical difference at storage level: graphs and documents model are in my opinion a perfect match, because a vertex (and an edge for that matter) can be stored as ordinary documents. This allows you to use any document query you have in a document (give me all users, which live in Denver) and start graph queries from the vertices found in this manner (give me their 1 and 2 level friends).
Nothing, most multi-model dbs store vertices (and edges) as documents
I'm not sure that graph data (generally) is particularly amenable to being spread across multiple nodes. My understanding is that ArangoDB has implemented some clustering based on Googles Pregel Framework, so I suspect it might fare a bit better in my friends test... but in spite of my urging I don't know that he has had time to recreate the test with Arango. I'm keeping my fingers crossed.
I don't know if any database is fun to deal with at that size. My experience with Arango has been an unremarkable amount of remarkably complex data, so I would also be interested to see the results with something huge.
Having stumbled upon some really complex data a few times now, I am increasingly appreciating how amazing it is to model your data any way you need, without having to deal with the complexity of running multiple data stores.
Cool to see that I apparently didn't give up any performance to get the flexibility. :)
I'd love to see them push the geospatial capabilities a little further, but they are already pretty decent.