> Recently I was conducting an evaluation of several different databases for a messaging workload.
> While PostgreSQL was a strong contender among SQL databases and had given good results in early experiments, I was looking for the ideal NoSQL candidate.
Maybe I just don't know much about these projects but is Cassandra a common choice for a messaging workload? Postgres also strikes me as a weird choice -- you're going to need to do some partitioning (or use a plugins like timescale and/or citus) to make things scale gracefully (not web scale per-say, but just reasonably active messaging over a long period of time).
I got some inside information that LINE (a popular messaging app here in Japan) has a large Kafka cluster -- maybe it's 100+ nodes by now.
In the end I do love me some postgres propaganda though, it's a fantastic tool -- IMO the best RDBMS out there, and maybe even better for some usecases you wouldn't expect.
On the Cassandra front, A bit disappointing to hear about the issues uptaking the patch. This kind of improvement seems well worth reviewing, but the priority is low., I do wonder where Scylla[0] would have been on the spectrum of performance.
[EDIT] - It looks like cassandra trunk wins on the margins[1], despite this speed up, so it's not so cut and dry.
[0]: https://www.datasciencecentral.com/profiles/blogs/scylla-vs-...
[1]: https://issues.apache.org/jira/browse/CASSANDRA-16499?focuse...
For messaging, the Commit Log and Log-Structured Merge Tree storage in Cassandra should have been near ideal. Commit Log gives durability, LSMT gives automatic compaction -- important for the pending work store, where you delete entries once work is completed.
The domain here is healthcare; due to legal requirements & existing product architecture, deployments and databases are usually per-customer rather than trying to scale the world on a single cluster. For these scenarios Postgres and Cassandra should both be (have been) highly suitable.
The Postgres ecosystem is very active these days, and offer options for further scaling. Aligning with Postgres already looks like a 5x scalability improvement over the prior datastore, and AWS Aurora and Citus Data both offer avenues for further performance if we ever needed.
> (CouchDB, it seemed awful).
Yeah I just... am not sure about picking document stores these days! I was the biggest fan of RethinkDB (over Mongo) and I always get CouchDB and CouchBase mixed up. At the end of the day, I almost always just choose to store some JSONB in Postgres.
> For messaging, the Commit Log and Log-Structured Merge Tree storage in Cassandra should have been near ideal. Commit Log gives durability, LSMT gives automatic compaction -- important for the pending work store, where you delete entries once work is completed.
> The domain here is healthcare; due to legal requirements & existing product architecture, deployments and databases are usually per-customer rather than trying to scale the world on a single cluster. For these scenarios Postgres and Cassandra should both be (have been) highly suitable.
Ahhh this makes a lot more sense now. So hard multi-tenancy + needs to be pre-approved.
> The Postgres ecosystem is very active these days, and offer options for further scaling. Aligning with Postgres already looks like a 5x scalability improvement over the prior datastore, and AWS Aurora and Citus Data both offer avenues for further performance if we ever needed.
I'm an unabashed Postgres shill I agree -- Postgres is the absolutely the most flexible database to build on, you can do OLAP, OLTP, partitioning and sharding (Partman/Timescale/Citus), and zheap is in the pipe (so MySQL's usual raison d'être is gone).
Enough shilling though, one thing I wonder about in this space -- are you planning on using ZFS at the bottom of any of these systems? With the recent spate of data ransoming attacks and hacks I wonder why people "don't just" (famous first words) put ZFS down there, add some replication + offsite incremental backups and sleep a bit easier at night?