Problems with CAP, and Yahoo’s little known NoSQL system (opens in new tab)

(dbmsmusings.blogspot.com)

76 pointsjermo11y ago17 comments

17 comments

15 comments · 8 top-level

shin_lao11y ago· 3 in thread

The author of CAP himself said that the CAP theorem gained undue popularity. It is an interesting theorem only because it models what partition implies in terms of trade-off when they happen.

But partitions are rare. And when partitions happen, generally you can't access the partitioned area so a lot of problems disappear.

Availability isn't an on/off switch, there is a wide range of how "available" you can be and what you can do. For example you can allow reads but disallow rights.

Last but not least, the most important is what happens after the partition is over and what level of guarantees you offer regarding coherence.

yummyfajitas11y ago

Partitions are not rare. A partition simply means that one node is not available to another, which does not have to be caused by a network failure. For example, a stop the world GC is equivalent to a network failure - the node is unreacheable at this time. So is a deployment with downtime (e.g. a 10 second restart per-node).

Also, while you can't access the partitioned area, you might have nodes outside that area. The question is whether the non-partitioned nodes become useless to you or not.

http://kellabyte.com/2013/11/04/the-network-partitions-are-r...

https://webcache.googleusercontent.com/search?q=cache:D3htez...

shin_lao11y ago

Partitions are rare relative to other events, and I was quoting Eric Brewer, the author of the CAP theorem.

If you have one partition per day (which is very often) but do 100,000 request per seconds: partitions are rare.

10 seconds is hardly a partition as it is below the default TCP timeout (one minute).

Also, while you can't access the partitioned area, you might have nodes outside that area.

So these nodes can't access the partition and you don't have any conflict.

2 more replies

ryanobjc11y ago

Also partition in the system definition can be as little as 'single node failure'.

But I've seen plenty of systems that handle single node failure, but really get fucked when you have a network split where 50% of the nodes can talk to each other but not the other 50%. CAP theorem doesn't help you build a system that doesn't do very bad things when this happens.

ryanobjc11y ago· 2 in thread

Classic blog from 2010. The idea didn't catch on however, and no one uses these terms. PNUTS is still completely closed source inside Yahoo -- this is because PNUTS depends on an internal queuing system.

As for YCSB, it's commonly references and is easily available on github. It hasn't changed much lately, and now with tools like jepsen that focus on correctness as well as performance, YCSB is no longer the preferred testing tool.

anonetal11y ago

I agree, and I wish the concepts here were more widely adopted. Many people seem to equate low-consistency modes (offered by Dynamo/Cassandra etc) with CAP and guaranteeing availability in presence of network partitions. In practice, network partitions seems much less of a concern and the low-consistency modes are really needed to get reasonable performance/latency instead.

ryanobjc11y ago

The concept listed in here is either (a) too complex or (b) not complete enough to provide a full rich mental model.

CAP was highly successful because it fits in with an existing meme - 3 things, you can only get 2 of them. It's simple, and easy to apply in a trivial fashion.

But CAP theorem doesn't have enough "meat" as an engineering analysis, and doesn't guide your system design. Yes there's a proof, but it doesn't tell you how to balance the 3 concepts and how to directly compare system designs.

The thing is that as a descriptive model of how an entire distributed database works, CAP just aint enough. That's why blog posts like https://aphyr.com/posts/313-strong-consistency-models and concepts of 'linearizability' are very useful.

aaa66711y ago· 2 in thread

I don't understand what is meant by "this means that the roles of the A and C in CAP are asymmetric" - could someone explain this to me?

jfoutz11y ago

You can be available, or you can be consistent. The higher guarantee of consistency you demand, lowers your availability.

Imagine a small database with a million copies on a million machines. when I write something, i'm going to have to wait a bit while all million machines acknowledge the write is complete. If i have a lower threshold, say 1, the write completes real fast but the million machines aren't consistent.

Real systems balance consistency and availability.

fred25611y ago

If I understand the author correctly: a CP system sacrifices Availability only during a Partition, whereas an AP system tends to sacrifice Consistency all the time.

alexnewman11y ago

Having implemented and used multiple consistent coordination systems: Raft+ in c5 (I helped write), Single Decree Paxos via DConE at WanDISCO and timeline consistent implementations via HBase, I have to say the academics miss that the devil is in the implementation details. We all focus on high level things like CAP (although FLP is what real hard core academics care about) when system details like how you can aggregate fsyncs, system pauses (Especially with GC) and how you integrate your coordination system into the larger system play a much larger role in overall system latency. The coordination posture is a minor detail when compared to GC issues. Now I know what you are going to say, "He's not an academic". He ran a real DB company. I totally disagree. The DB community in general focuses on the wrong thing. That's true in Hadoop, it's true with cassandra, and I would bet that it's true at google as well.

hiphipjorge11y ago

A classic! If you had to take away one thing from this article, it would be that the CAP theorem (while useful) doesn't take into consideration latency, which might be more important than even more important than partition tolerance. Yes, partition tolerance is inevitable, but you have to deal with latency every single time!

int19h11y ago

The way I've heard the CAP theorem used practically at a high level is to frame the question as follows:

What happens in the case of a (logical) network partition? - an AP system continues taking requests and provides eventual consistency, while a CP system waits for the partition to go away, or says come back later.

explosion11y ago

I like the author's concept of PACELC, though it seems a bit implementation-specific.

jchrisa11y ago

title should say (2010)

j / k navigate · click thread line to collapse

17 comments

15 comments · 8 top-level

shin_lao11y ago· 3 in thread

The author of CAP himself said that the CAP theorem gained undue popularity. It is an interesting theorem only because it models what partition implies in terms of trade-off when they happen.

But partitions are rare. And when partitions happen, generally you can't access the partitioned area so a lot of problems disappear.

Availability isn't an on/off switch, there is a wide range of how "available" you can be and what you can do. For example you can allow reads but disallow rights.

Last but not least, the most important is what happens after the partition is over and what level of guarantees you offer regarding coherence.

yummyfajitas11y ago

Also, while you can't access the partitioned area, you might have nodes outside that area. The question is whether the non-partitioned nodes become useless to you or not.

http://kellabyte.com/2013/11/04/the-network-partitions-are-r...

https://webcache.googleusercontent.com/search?q=cache:D3htez...

shin_lao11y ago

Partitions are rare relative to other events, and I was quoting Eric Brewer, the author of the CAP theorem.

If you have one partition per day (which is very often) but do 100,000 request per seconds: partitions are rare.

10 seconds is hardly a partition as it is below the default TCP timeout (one minute).

Also, while you can't access the partitioned area, you might have nodes outside that area.

So these nodes can't access the partition and you don't have any conflict.

2 more replies

ryanobjc11y ago

Also partition in the system definition can be as little as 'single node failure'.

ryanobjc11y ago· 2 in thread

anonetal11y ago

ryanobjc11y ago

The concept listed in here is either (a) too complex or (b) not complete enough to provide a full rich mental model.

CAP was highly successful because it fits in with an existing meme - 3 things, you can only get 2 of them. It's simple, and easy to apply in a trivial fashion.

aaa66711y ago· 2 in thread

I don't understand what is meant by "this means that the roles of the A and C in CAP are asymmetric" - could someone explain this to me?

jfoutz11y ago

You can be available, or you can be consistent. The higher guarantee of consistency you demand, lowers your availability.

Real systems balance consistency and availability.

fred25611y ago

If I understand the author correctly: a CP system sacrifices Availability only during a Partition, whereas an AP system tends to sacrifice Consistency all the time.

alexnewman11y ago

hiphipjorge11y ago

int19h11y ago

The way I've heard the CAP theorem used practically at a high level is to frame the question as follows:

explosion11y ago

I like the author's concept of PACELC, though it seems a bit implementation-specific.

jchrisa11y ago

title should say (2010)

j / k navigate · click thread line to collapse