But as there is no independent institution that compared our product and as we want to know where we stand with ArangoDB, Claudius did his own tests. And as the work is already done, why not share it.
We tried our best to do it as open as possible. PostgreSQL performed very well and we have a problem with memory consumption - have a look at the charts, we will try to improve there.
- Every database configuration is public
- All test scripts are available on Github
- We publish updates if we get pull-requests or comments with suggestions for improvements
We did that before and after the last test, some database vendors sent us improved snapshots of their databases which found their way into the latest products (OrientDB and Neo4j).
If you have suggestions for improvements, please let us know.
Despite the fact that you crippled it by not using jsonb columns.
Not that I can verify it, because the code in the linked public "No magic, no tricks – check the code and make your own tests!" repository doesn't match the published results and doesn't even work at all with postgres…
EDIT: Okay, they pushed a new version containing the Postgres data now. They ARE using the cripplingly slow json columns, not jsonb columns recommended by the documentation.
If anything it just proves even after almost a decade of these "NoSQL" solutions being around they still can't compete even on basic queries with Postgres which is a fairly conservative SQL solution.
I still think PostgresSQL and MariaDB are a better tool for most jobs considered big data.
jsonb is superior when:
1. You want to use any of the built-in JSON functions, e.g. for extracting fields from the document.
2. You want to index the JSON (either the entire thing via GIN, or individual fields via ordinary B-tree indexes).
3. You want to save space; jsonb strips whitespace.
jsonb incurs an overhead on both reads and writes since it must serialize to/from textual JSON.
I wanted to move on to RethinkDB next, but I see your point that a comparison between the different JSON formats of Postgres can also be very enlightening. This should replace guessing with hard facts. As always I will update the blog post and add this tests as well - as we did in the past, see https://www.arangodb.com/nosql-performance-blog-series/.
If you have any improvements concerning the configuration of Postgres or SQL queries, I'm will be more than happy to include them as well in the update. I will push the used configuration to GITHUB as well.
For instance, we didn't use the index that makes the database go fast to make our own database look good.
https://snap.stanford.edu/data/soc-pokec.html
But of course, you need to test and decide on basis of your individual requirements and use cases.
I've done a bunch of related benchmarkings, and the smallest real-world dataset I've used is the largest one on SNAP: orkut.
We currently look into it. Thank's for the mirrored page.
These are also graph database benchmarks that are synthetic, designed to look like real data and are quite hard to do well on.
As someone responsible for a public free to use deployment of a graph database with more than 2 billion nodes and 15 billion edges (sparql.uniprot.org) I must say this looks like a SPARQL benchmark from 10 years ago.
http://www.slideshare.net/sympapadopoulos/adbis2014-presenta...
I tried ArangoDB about a year ago, I think I still have the branch that I tried it on. After spending a weekend porting some stuff from MongoDB to Arango, I ended up regretting doing that by Sunday evening. It'd be nice to fire things up, update the branch's code and see how it performs.
=> suspicion