NoSQL Benchmark Compares PostgreSQL, MongoDB, Neo4j, OrientDB and ArangoDB (opens in new tab)

(arangodb.com)

70 pointssachalep10y ago53 comments

53 comments

39 comments · 18 top-level

cwyers10y ago· 7 in thread

I really, really distrust these kinds of evaluations when they come from someone whose product is included in the comparison. Even if everything is above-board, they're not going to publish if it shows their product just completely sucks at it. That kind of publication bias makes these kinds of results a lot less trustworthy than independent benchmarks even if you assume the best of intentions from the people putting them out there.

ifcologne10y ago

Ingo from ArangoDB here. I agree that vendor tests are always biased, of course you want to show that your product is competitive.

But as there is no independent institution that compared our product and as we want to know where we stand with ArangoDB, Claudius did his own tests. And as the work is already done, why not share it.

We tried our best to do it as open as possible. PostgreSQL performed very well and we have a problem with memory consumption - have a look at the charts, we will try to improve there.

- Every database configuration is public

- All test scripts are available on Github

- We publish updates if we get pull-requests or comments with suggestions for improvements

We did that before and after the last test, some database vendors sent us improved snapshots of their databases which found their way into the latest products (OrientDB and Neo4j).

If you have suggestions for improvements, please let us know.

creshal10y ago

> PostgreSQL performed very well

Despite the fact that you crippled it by not using jsonb columns.

2 more replies

creshal10y ago

The particular benchmark seems bullshit, too. For postgres they seem to intentionally use the less performant json column type instead of jsonb.

Not that I can verify it, because the code in the linked public "No magic, no tricks – check the code and make your own tests!" repository doesn't match the published results and doesn't even work at all with postgres…

EDIT: Okay, they pushed a new version containing the Postgres data now. They ARE using the cripplingly slow json columns, not jsonb columns recommended by the documentation.

jpgvm10y ago

And despite that Postgres destroys all the other solutions at everything other than the non-sync (lol) single write case and graph traversal.

If anything it just proves even after almost a decade of these "NoSQL" solutions being around they still can't compete even on basic queries with Postgres which is a fairly conservative SQL solution.

1 more reply

andy_ppp10y ago

Just to be a pedant, the JSONB format as I understand is marginally slower at inserts and orders of magnitude faster at everything else...

2 more replies

amirouche10y ago

this is more constructive that flooding the communications channels with hidden facts like Neo4j does.

qaqy10y ago

WTF with VCs investing into all this crap? There are very few scenarios were you would be better off with NoSQL solution and there are established players serving those niches already.

baldfat10y ago· 4 in thread

I was kind of shocked how good PostgesSQL did.

I still think PostgresSQL and MariaDB are a better tool for most jobs considered big data.

JamesMcMinn10y ago

Postgres was actually somewhat crippled in these tests since they used json rather than jsonb for storage, which stores the json in a binary format which doesn't need to be serialised on reads.

lobster_johnson10y ago

That's not quite correct. The jsonb requires that reads deserialize jsonb into textual JSON, whereas the json type can be sent directly to the client with no processing.

jsonb is superior when:

1. You want to use any of the built-in JSON functions, e.g. for extracting fields from the document.

2. You want to index the JSON (either the entire thing via GIN, or individual fields via ordinary B-tree indexes).

3. You want to save space; jsonb strips whitespace.

jsonb incurs an overhead on both reads and writes since it must serialize to/from textual JSON.

yahliwharton10y ago

This is not a cluster test. NoSQL databases in general are optimized for scaling horizontally on commodity hardware. That's more tricky in RDBMS.

collyw10y ago

Most people I have spoken to are using NoSQL on single nodes.

1 more reply

don7110y ago· 2 in thread

I'm Claudius, author of the tests. I've been asked to include a lot of different databases into the test runs. The most requested databases were Postgres/JSON and RethinkDB. I started with Postgres. The Postgres manual states that JSONB might be faster, but some StackOverflow answers indicate that it takes more space than JSON, while JSON might be slightly more compatible with legacy code. I've shown the queries and setup to some local Postgres users. They did not point that JSONB will be much faster for the kinds of requests used in the test setup. For instance, we do not use special indexes apart from the primary one by choice.

I wanted to move on to RethinkDB next, but I see your point that a comparison between the different JSON formats of Postgres can also be very enlightening. This should replace guessing with hard facts. As always I will update the blog post and add this tests as well - as we did in the past, see https://www.arangodb.com/nosql-performance-blog-series/.

If you have any improvements concerning the configuration of Postgres or SQL queries, I'm will be more than happy to include them as well in the update. I will push the used configuration to GITHUB as well.

chucky_z10y ago

Please refer to the #postgresql channel on irc.freenode.net for any postgres inquiries, you will receive an answer from experts and core developers on the correct processes within minutes for almost any question. It is a very active channel full of knowledgeable folks.

oralhistory10y ago

> For instance, we do not use special indexes apart from the primary one by choice.

For instance, we didn't use the index that makes the database go fast to make our own database look good.

crudbug10y ago· 2 in thread

Would like to see - Titan with Cassandra backend here.

lobster_johnson10y ago

Out of interest, which version of Titan are you on? I see that 1.0 was released recently, with little apparent fanfare.

pella10y ago

or with http://www.scylladb.com/ backend. "ScyllaDB: world's fastest NoSQL column store database; Fully compatible with Apache Cassandra at 10x the throughput and jaw dropping low latency"

ilaksh10y ago· 2 in thread

Why not include redis or rethinkdb?

amirouche10y ago

redis and rethinkdb are not ACID across documents. So it's not the same usecase at all.

merlincorey10y ago

Are you implying that MongoDB and friends are ACID across documents?

covi10y ago· 2 in thread

The graph dataset is too small in size. It makes little sense for real-world usage.

ifcologne10y ago

Ingo from ArangoDB: Despite it's the whole dataset of a real-world use case. :)

https://snap.stanford.edu/data/soc-pokec.html

But of course, you need to test and decide on basis of your individual requirements and use cases.

covi10y ago

Ingo - SNAP has a bunch of other "real-world use case" graphs available for free, many of which larger than this 1M-node, 30M-edge toy.

I've done a bunch of related benchmarkings, and the smallest real-world dataset I've used is the largest one on SNAP: orkut.

kbenson10y ago· 1 in thread

I wonder why there's not the equivalent to the Frameworks Benchmark[1] for databases. It seems we could all really benefit from that. Ideally it would get to a place where they would be able to simulate real-world worst case scenarios and test for problems. Each database would likely want multiple entries with different configs, but if you have some engineered failure scenarios and tests in the results it becomes obvious what the trade-off is. Sure, a specific setting may reduce consistency in the event of a failure for speed, but sometimes that's what you might want, and if the failure cases clearly show the problem, at least you aren't going in blind.

1: https://www.techempower.com/benchmarks/

crudbug10y ago

Having benchmarks for different storage models : Relational/Document/Graph/Object/XML, would be a better solution.

exo76210y ago· 1 in thread

Hugged to death.

https://archive.is/cMWCQ

ifcologne10y ago

No, running on XXXXX Cloud. :(

We currently look into it. Thank's for the mirrored page.

jerven10y ago

I am just going to say: have a try with the LDBC social benchmark http://ldbcouncil.org/ and http://ldbcouncil.org/benchmarks. Where you can even have audited results.

These are also graph database benchmarks that are synthetic, designed to look like real data and are quite hard to do well on.

As someone responsible for a public free to use deployment of a graph database with more than 2 billion nodes and 15 billion edges (sparql.uniprot.org) I must say this looks like a SPARQL benchmark from 10 years ago.

n7210y ago

Clicking the link got me "Error establishing a database connection." :/

gegtik10y ago

Looking around, it seems that different graph engines pull ahead depending on the use case.

http://www.slideshare.net/sympapadopoulos/adbis2014-presenta...

howdoipython10y ago

>Error establishing a database connection

nevi-me10y ago

Like others mention here, I'm skeptical of these types of comparisons. If I compare myself to my competitors, I won't publish results if they're better than me.

I tried ArangoDB about a year ago, I think I still have the branch that I tried it on. After spending a weekend porting some stuff from MongoDB to Arango, I ended up regretting doing that by Sunday evening. It'd be nice to fire things up, update the branch's code and see how it performs.

hardwaresofton10y ago

No RethinkDB?

acjohnson5510y ago

Comparison of X1, X2, ... , Xn, Y, written by Y

=> suspicion

jbverschoor10y ago

and now a 10-node cluster

Mindstormy10y ago

Would love to see the results for CouchDB in comparison to these.

curiousjorge10y ago

I have looked at ArangoDB and really hope it takes off, it has some pretty nifty features I think just at this point the lack of integration with frameworks like Meteor.js is holding me back.

j / k navigate · click thread line to collapse

53 comments

39 comments · 18 top-level

cwyers10y ago· 7 in thread

ifcologne10y ago

Ingo from ArangoDB here. I agree that vendor tests are always biased, of course you want to show that your product is competitive.

We tried our best to do it as open as possible. PostgreSQL performed very well and we have a problem with memory consumption - have a look at the charts, we will try to improve there.

- Every database configuration is public

- All test scripts are available on Github

- We publish updates if we get pull-requests or comments with suggestions for improvements

We did that before and after the last test, some database vendors sent us improved snapshots of their databases which found their way into the latest products (OrientDB and Neo4j).

If you have suggestions for improvements, please let us know.

creshal10y ago

> PostgreSQL performed very well

Despite the fact that you crippled it by not using jsonb columns.

2 more replies

creshal10y ago

The particular benchmark seems bullshit, too. For postgres they seem to intentionally use the less performant json column type instead of jsonb.

EDIT: Okay, they pushed a new version containing the Postgres data now. They ARE using the cripplingly slow json columns, not jsonb columns recommended by the documentation.

jpgvm10y ago

And despite that Postgres destroys all the other solutions at everything other than the non-sync (lol) single write case and graph traversal.

If anything it just proves even after almost a decade of these "NoSQL" solutions being around they still can't compete even on basic queries with Postgres which is a fairly conservative SQL solution.

1 more reply

andy_ppp10y ago

Just to be a pedant, the JSONB format as I understand is marginally slower at inserts and orders of magnitude faster at everything else...

2 more replies

amirouche10y ago

this is more constructive that flooding the communications channels with hidden facts like Neo4j does.

qaqy10y ago

WTF with VCs investing into all this crap? There are very few scenarios were you would be better off with NoSQL solution and there are established players serving those niches already.

baldfat10y ago· 4 in thread

I was kind of shocked how good PostgesSQL did.

I still think PostgresSQL and MariaDB are a better tool for most jobs considered big data.

JamesMcMinn10y ago

Postgres was actually somewhat crippled in these tests since they used json rather than jsonb for storage, which stores the json in a binary format which doesn't need to be serialised on reads.

lobster_johnson10y ago

That's not quite correct. The jsonb requires that reads deserialize jsonb into textual JSON, whereas the json type can be sent directly to the client with no processing.

jsonb is superior when:

1. You want to use any of the built-in JSON functions, e.g. for extracting fields from the document.

2. You want to index the JSON (either the entire thing via GIN, or individual fields via ordinary B-tree indexes).

3. You want to save space; jsonb strips whitespace.

jsonb incurs an overhead on both reads and writes since it must serialize to/from textual JSON.

yahliwharton10y ago

This is not a cluster test. NoSQL databases in general are optimized for scaling horizontally on commodity hardware. That's more tricky in RDBMS.

collyw10y ago

Most people I have spoken to are using NoSQL on single nodes.

1 more reply

don7110y ago· 2 in thread

chucky_z10y ago

oralhistory10y ago

> For instance, we do not use special indexes apart from the primary one by choice.

For instance, we didn't use the index that makes the database go fast to make our own database look good.

crudbug10y ago· 2 in thread

Would like to see - Titan with Cassandra backend here.

lobster_johnson10y ago

Out of interest, which version of Titan are you on? I see that 1.0 was released recently, with little apparent fanfare.

pella10y ago

or with http://www.scylladb.com/ backend. "ScyllaDB: world's fastest NoSQL column store database; Fully compatible with Apache Cassandra at 10x the throughput and jaw dropping low latency"

ilaksh10y ago· 2 in thread

Why not include redis or rethinkdb?

amirouche10y ago

redis and rethinkdb are not ACID across documents. So it's not the same usecase at all.

merlincorey10y ago

Are you implying that MongoDB and friends are ACID across documents?

covi10y ago· 2 in thread

The graph dataset is too small in size. It makes little sense for real-world usage.

ifcologne10y ago

Ingo from ArangoDB: Despite it's the whole dataset of a real-world use case. :)

https://snap.stanford.edu/data/soc-pokec.html

But of course, you need to test and decide on basis of your individual requirements and use cases.

covi10y ago

Ingo - SNAP has a bunch of other "real-world use case" graphs available for free, many of which larger than this 1M-node, 30M-edge toy.

I've done a bunch of related benchmarkings, and the smallest real-world dataset I've used is the largest one on SNAP: orkut.

kbenson10y ago· 1 in thread

1: https://www.techempower.com/benchmarks/

crudbug10y ago

Having benchmarks for different storage models : Relational/Document/Graph/Object/XML, would be a better solution.

exo76210y ago· 1 in thread

Hugged to death.

https://archive.is/cMWCQ

ifcologne10y ago

No, running on XXXXX Cloud. :(

We currently look into it. Thank's for the mirrored page.

jerven10y ago

I am just going to say: have a try with the LDBC social benchmark http://ldbcouncil.org/ and http://ldbcouncil.org/benchmarks. Where you can even have audited results.

These are also graph database benchmarks that are synthetic, designed to look like real data and are quite hard to do well on.

n7210y ago

Clicking the link got me "Error establishing a database connection." :/

gegtik10y ago

Looking around, it seems that different graph engines pull ahead depending on the use case.

http://www.slideshare.net/sympapadopoulos/adbis2014-presenta...

howdoipython10y ago

>Error establishing a database connection

nevi-me10y ago

Like others mention here, I'm skeptical of these types of comparisons. If I compare myself to my competitors, I won't publish results if they're better than me.

hardwaresofton10y ago

No RethinkDB?

acjohnson5510y ago

Comparison of X1, X2, ... , Xn, Y, written by Y

=> suspicion

jbverschoor10y ago

and now a 10-node cluster

Mindstormy10y ago

Would love to see the results for CouchDB in comparison to these.

curiousjorge10y ago

I have looked at ArangoDB and really hope it takes off, it has some pretty nifty features I think just at this point the lack of integration with frameworks like Meteor.js is holding me back.

j / k navigate · click thread line to collapse