Cockroach performance seems to scale linearly, but single-connection performance, especially for small transactions, seems rather dismal. Some casual stress testing against a 3-node cluster on Kubernetes showed that small transactions modifying a single row could take as much as 7-8 seconds, where Postgres would take just a few milliseconds.
The documentation recommends that you batch as many updates as possible, but obviously that doesn't work for low-latency applications like web frontends that need to be able to do small, fine-grained modifications.
- Replication factor increased to 5x (rather than the 3x default)
- 8 indexes on the table being modified which also needed to be updated
- Nodes spread across North America, incurring higher RTT latency between nodes
- Relatively high contention on the data triggering client-side retries
- HDD's as the storage medium (RockDB is optimized for SSDs)That's surprising. I wasn't expecting CockroachDB to be really fast, given the constraints they work within. But that sounds more like a bug or config error. Unless perhaps you mean a really high number of processes trying to update the same row at the same time? Like a global counter or something?
...
Even if I like CockroachDB's pg sql more, it would be helpful to have the comparison/benchmark to show something more.
TiDB has a weird kind of variation on "read committed" where you get phantom reads (though they're not called that in the documentation, which is actually ambiguous on this point). This is a problem for apps that expect consistency.
TiDB supports READ COMMITTED isolation which is not the same as MySQL, but it is just designed for some special cases for TiDB itself and it is not recommended for external users.
TiDB has been widely adopted by many users (https://github.com/pingcap/docs/blob/master/adopters.md) in production because it support the best features of both RDBMS and NoSQL. It is quickly evolving and iterating based on users’ requirements which are prioritized and listed on the Roadmap.
I'd really love some kind of distributed-for-performance database using the exact optimizer and query planner of SQLite plus std plugins (FTS5, JSON, transitive_closure, spatial). Something like a mix between Bloomberg's comdb2 (which uses a modified SQLite frontend) and rqlite (distributed-for-safety).
Note: You can save most of the shortcomings of CDB on the SQL client side today, but don't underestimate the time it takes to implement CDB-specific workarounds...
Why do you think the identical optimizer and query planner would work in a distributed environment, with no changes from the single server implementations?
"Note: We have not filed for official certification of our TPC-C results. However, we will post full reproduction steps in a forthcoming whitepaper."
The Oracle on SPARC cluster (at the top, 2010) performs 30.2M qualified tx/min vs the 16K tx/min in this blog post. The Oracle cluster also costs $30M, which is clearly higher than the Cockroach cluster's cost.
That said, the TPC-C benchmark is new to me. Happy to update this comment if I'm misreading the numbers.
(Edited to incorporate the reply below.)
We're focusing today on our improvements over CockroachDB 1.1, using a small-ish cluster. We'll be showing some more scalability with larger clusters in the coming weeks. If you've found CockroachDB performance slow in the past, you will be pleasantly surprised with this release!
My guess is that the benchmark setup would cost about 1m dollars to install (3 racks of commodity servers). The software is free. Naturally, Oracle aren't pushing this, when they charge 10s of millions for Oracle rack :)
This database has the potential to dethrone Spanner in a major way.
A lot of congrats and excitement, questions about who uses it in a production environment, very specific use-case questions, and of course the name.
Weird how predictable the response to one company/tech always is.
However....
If I worked at CockroachDB, and I saw the negative feedback around the name, I'd take it to heart. At the end of the day, the name is marketing for the hard work of their engineers, and marketing for the engineers that want to use this DB (remember, they need to sell it to their managers who may not be technical).
This issue can show up in unexpected ways. For example, for cloud providers like Compose (IBM company), would they be comfortable with putting "CockroachDB" on the front page? They might if it's good enough, but it's at least a consideration (i.e. another meeting, another stakeholder to convince).
Or how about an enterprise company that's going through due diligence, and when their client asks them about their tech stack do they say "CockroachDB" or do they obfuscate the name by saying "It's a high-performance distributed database". That's a crucial moment to market CockroachDB, and it could get lost. As sad as it is, saying that you're using MySQL "because Oracle" is a point of leverage for some sales people.
Is the name worth it? Asking honestly.
Enterprise pricing generally basically scales with the size of your company/budget and how much trouble they think you'll be worth as a customer.
As a rule of thumb, it starts at just above 1000 USD per unit, and goes up from there.
Many contracts are bespoke orders especially when you're dealing with a small company, so you can't have transparency since there isn't a single product.
https://blogs.msdn.microsoft.com/e7/2009/05/02/a-little-bit-...
[1] github.com/heroiclabs/nakama
A n1-highcpu-16 GCE VM costs $289.84/month. Local SSDs are added at 375GB per drive, and they cost $30/month at $0.08 per GB. I highly doubt you could fit the ~1250 warehouses (what got you the peak TPM-C) on 375GB local SSD, but I have to make assumptions here! So, now you're paying $319.84 per instance per month, or $949.52 for 3 of these instances.
At 16,150 TPC-C, you're paying roughly $0.06 per TPC-C, or, looking at it the other way, you're getting 16.83 TPC-C per dollar spent each month. Is that good? I don't know!
Now, the really interesting question is, is that TPC-C/$ on CRDB 2.0 actually better than TPC-C/$ on CRDB 1.1? The answer lies in how many local SSDs you have to provision to reach that peak throughput. Peak is at ~1300 warehouses on CRDB 2.0, and ~800 warehouses on CRDB 1.1.
Does anyone with more knowledge here know how much storage you need per warehouse in the TPC-C test?
edit: Looking into it even further, I agree with the co-author's response here that TPC-C is still an appropriate metric. TPC-E is different and newer but still not as widely used.
We chose TPC-C because it's far more understood than TPC-E in 2018. We wanted to provide understandable benchmarks that can be put into context with other databases. Other databases report TPC-C numbers, so we choose to do so as well.
Transactional writes are likely the slowest thing since they need to talk to all replicas.
Would be great to see how it compares against postgres in similar scenarios.