story

CockroachDB: A Scalable, Geo-Replicated, Transactional Datastore (opens in new tab)

github.com

77 pointssokrates11y ago55 comments

55 comments

That's a funny name for a database. At least you'll know that whoever uses it does not use it for its buzzword value.

How will this handle partitioning of the network? The readme has a lot of info about bits of the far-flung cluster failing but nothing about how it would deal with the whole cluster being chopped up into roughly equal halves. That's one of the harder problems to deal with for solutions aimed at this space.

simpsond11y ago

Cockroaches are resilient and hard to kill.

jacquesm11y ago

I'm aware of that.

warfangle11y ago

Riak, when you use LevelDB behind it, certainly has indexes. As many as you want (within reason), through secondary indexes[0]. While it doesn't have joins specifically, you can link data blobs and walk the links[1]. For when that isn't quite enough, you can always perform a compiled erlang map reduce across a given dataset[2].

I don't quite see how CockroachDB offers anything Riak doesn't.

Riak, while not offering true locking transactions (it doesn't look like CockroachDB does either - imagine how long it would take to perform a locked transaction across sixteen data centers in as many countries, two of which have gone dark due to power outages and giant robots), offers you the option of resolving data version conflicts when you read the record[3]. (ed. Many times if doing a partial update of a record, you need to read before writing anyway. This resolves a conflict before you write to a potentially conflicted record chain. Typically this is done with a pre-commit hook. [4])

(ed.: The major differences seem to stem from the snapshotting system CDB uses to provide external consistency across data centers. This comes at a (potentially huge, especially if two clusters lose connection with each other but not with clients) delay in write verification.

Riak, on the other hand, would still allow writes - and would resolve any conflicts when the datacenters connect again. It's a hairy problem to fix, especially in a general manner.

It all depends on what kind of data you're storing.)

0. http://docs.basho.com/riak/latest/dev/using/2i/

1. http://docs.basho.com/riak/latest/dev/using/link-walking/

2. http://docs.basho.com/riak/latest/dev/using/mapreduce/

3. http://docs.basho.com/riak/latest/theory/concepts/Vector-Clo...

4. http://docs.basho.com/riak/latest/dev/using/commit-hooks/

hendzen11y ago

Simple, CockroachDB is a CP system, while Riak is an AP system.

If you need multi-key ACID transactions, and can tolerate potential downtime in the event that some partition loses a majority of its Raft replicas, you might want to use CockroachDB.

If high availability is a concern, and you can tolerate the occasional data conflict in the case of incomparable vector clocks due to writes accepted during a network partition, or, if your schema can be modeled with CRDTs (LWW register, PN counter, Union-Set, etc), you might want to use Riak.

itsnotvalid11y ago

Multi Data Center Replication doesn't come in the free-lunch-pack. To get that with Riak, you probably need to downpay $6000 for getting that license for a node. So if you have two geolocations for your data, it would be at least $12000 for a minimal setup.

Of course for people looking at the usage for this, money is not the major issue.

warfangle11y ago

Very good point.

rdtsc11y ago

What is your comment on the "No availability or weak consistency with datacenter failure" part?

Is that referring to Riak's cross data center replication (enterprise feature). I guess for regular case (non-enterprise version) it is true, as it is not possible to specifically assign ring sections to data centers?

maaku11y ago

Fully ACID transactions is a big deal.

dgrnbrg11y ago

They're based on Raft--that's not a consensus protocol that's designed for multi-datacenter operations. I suspect you'll have reliability and throughput issues fairly quickly, just as you see with multi-datacenter zookeeper.

The solution Google uses for this kind of problem: multidatacenter transactions are rare, so they're not optimized for latency (instead for reliability), and they tend to use 2PC, as it's easier to get right with unpredictable WAN latencies.

1 more reply

rch11y ago

Riak will have strongly consistent buckets in 2.0+, which pretty much takes care of the cases in which I'd need guarantees for data in this storage model.

maaku11y ago

Consistency of single updates is vastly different than multi-write atomic transactions. The former precludes, for example, financial applications which require atomic updates of multiple balances.

1 more reply

limsup11y ago

I assume it's called this because a cockroach can supposedly survive a nuclear attack. But it's a bad name. It does not invoke good feelings.

notduncansmith11y ago

Maybe "Caracha" or "CarachaDB"? As in, "La Cucaracha". Still there, but veiled enough that it won't make people uncomfortable just on hearing it.

taternuts11y ago

I have to agree - maybe 'RoachDB' would be better

morgante11y ago

I like RoachDB. The idea is still there, but it's a little more fun.

angersock11y ago

Sure, but it goes down every April 20th...

jc_dntn11y ago

I prefer "CockDB".

jjoergensen11y ago

It reminds me of a female programmer that I once worked with. By mistake she had originally named one of our busiest databases "ClickCuntDB"

It was a database for counting the clicks on our website :-)

saraid21611y ago

Why? Because the database runs around like a chicken with its head cut off?

oalders11y ago

It's memorable and I think the meaning is pretty clear, but it did make me shudder as well. :)

jmspring11y ago

I like the name. I had a good idea of the goals when I saw that bit of the title without reading the rest...

donut2d11y ago

Have to agree. I have so many bad feelings attached to cockroaches that just seeing the name makes me sick to my stomach.

gresrun11y ago

Trademark considerations aside, I'd vote for TwinkieDB

Andrex11y ago

Or even better, TwinkDB.

gresrun11y ago

Ummm... http://en.wikipedia.org/wiki/Twink_(gay_slang)

increment_i11y ago

Gotta agree. Cringe inducing name when you first hear it, but I imagine it would become less so as you worked with it more and more. You'd be desensitized to it after a while I'd imagine.

orasis11y ago

Change the name. I get the joke, but it has an emotionally negative connotation that bosses will hate.

bdevine11y ago

That's the first thing I thought. If anybody from the team is looking, how about something like BlattoDB[0]?

[0] http://en.m.wikipedia.org/wiki/Blattaria

notduncansmith11y ago

Sounds a lot like "blotto", which is the last state I want my DB in.

bdevine11y ago

True, true. But the point still stands that there are options which hint at the durability of cockroaches without invoking disgust!

Actually this all does remind me of research on the tangible effect of disgust on products -- see [0]. That work studied physical contact, but it's easy to extrapolate from there.

[0] https://faculty.fuqua.duke.edu/~gavan/bio/GJF_articles/conta...

dmarlow11y ago

CroachDB

pnathan11y ago

Terrible name.

logn11y ago

CoachDB ?

aerialfish11y ago

Agreed. Take out the 'roach' part and watch its popularity skyrocket.

dmarlow11y ago

nice

dang11y ago

Given that most of the comments are merely about the name, and that the author has implied that the software doesn't work [1], it seems there's little to discuss here. We're going to demote this submission [2].

1. https://twitter.com/andybons/status/472458545154494465. The answer to that question, btw, is yes. Reposts of stories that have had significant attention are treated as dupes for about a year.

2. That's not a criticism of the submitter. We want to see original work on HN. But there ought to be some substance to it, as well as to the resulting discussion.

rb2k_11y ago

How would one communicate with this DB?

I'd love to see some API examples.

Meai11y ago

Also benchmarks comparing it to RethinkDb, Mongodb, and sql examples (supposedly this is NewSQL, how much SQL does it even support?) These questions are important

teraflop11y ago

Here's the transactional key/value store API: http://godoc.org/github.com/cockroachdb/cockroach/kv

And an RDBMS-like layer on top of it: http://godoc.org/github.com/cockroachdb/cockroach/structured

candybar11y ago

As for the name, which I agree is problematic as is, how about EntomoDB for entomos (insect)?

Edit: It's not problematic if success is not an objective. But if it is, choosing a name with such strong established negative connotations is not wise.

maaku11y ago

How is the name problematic? I knew exactly what they were saying and why when I saw it. If it were me I would have shortened it to RoachDB, but that's just marketing.

enraged_camel11y ago

>>How is the name problematic?

Most people are disgusted by cockroaches. I think that's a good enough reason to change the name, at least if you want the product to be taken seriously.

mahkoh11y ago

Friendly reminder that "Mongo" is a very offensive word in German. A "Mongo" is a person suffering from Down syndrome. CockroachDB is a walk in the park compared to MongoDB.

1 more reply

iLoch11y ago

I agree it would probably be a good idea to chop off the cock.

nawitus11y ago

How does it handle replication and the resulting conflicts?

teraflop11y ago

It uses strongly consistent replication, so there are no conflicts.

nawitus11y ago

So it doesn't support "proper" replication, e.g. the kind where the databases are not connected 100% the time perfectly? And I wonder how they can prevent conflicts due to latency.. Even if there's a 50ms latency, is the other database going to wait for 50ms between every write or something?

teraflop11y ago

Well, that's a matter of terminology. It uses quorum replication, so it can make progress as long as a majority of replicas are online and communicating. I would consider that "proper" replication in the sense of a replicated state machine.

You're right that it's different from, say, master/slave replication in an SQL database. There's no distinction between an authoritative master and a slave that provides stale data. Each machine either gives you consistent reads and writes, or is unavailable.

As far as latency goes, the gory details are in the design document. You need to talk to at least N/2 other replicas; there's no way around that without giving up consistency. But that doesn't mean you can only do one transaction every 50ms; they can be pipelined, and non-conflicting transactions can proceed simultaneously.

1 more reply

j / k navigate · click thread line to collapse

55 comments

jacquesm11y ago

That's a funny name for a database. At least you'll know that whoever uses it does not use it for its buzzword value.

simpsond11y ago

Cockroaches are resilient and hard to kill.

jacquesm11y ago

I'm aware of that.

warfangle11y ago

I don't quite see how CockroachDB offers anything Riak doesn't.

Riak, on the other hand, would still allow writes - and would resolve any conflicts when the datacenters connect again. It's a hairy problem to fix, especially in a general manner.

It all depends on what kind of data you're storing.)

0. http://docs.basho.com/riak/latest/dev/using/2i/

1. http://docs.basho.com/riak/latest/dev/using/link-walking/

2. http://docs.basho.com/riak/latest/dev/using/mapreduce/

3. http://docs.basho.com/riak/latest/theory/concepts/Vector-Clo...

4. http://docs.basho.com/riak/latest/dev/using/commit-hooks/

hendzen11y ago

Simple, CockroachDB is a CP system, while Riak is an AP system.

If you need multi-key ACID transactions, and can tolerate potential downtime in the event that some partition loses a majority of its Raft replicas, you might want to use CockroachDB.

itsnotvalid11y ago

Of course for people looking at the usage for this, money is not the major issue.

warfangle11y ago

Very good point.

rdtsc11y ago

What is your comment on the "No availability or weak consistency with datacenter failure" part?

maaku11y ago

Fully ACID transactions is a big deal.

dgrnbrg11y ago

1 more reply

rch11y ago

Riak will have strongly consistent buckets in 2.0+, which pretty much takes care of the cases in which I'd need guarantees for data in this storage model.

maaku11y ago

Consistency of single updates is vastly different than multi-write atomic transactions. The former precludes, for example, financial applications which require atomic updates of multiple balances.

1 more reply

limsup11y ago

I assume it's called this because a cockroach can supposedly survive a nuclear attack. But it's a bad name. It does not invoke good feelings.

notduncansmith11y ago

Maybe "Caracha" or "CarachaDB"? As in, "La Cucaracha". Still there, but veiled enough that it won't make people uncomfortable just on hearing it.

taternuts11y ago

I have to agree - maybe 'RoachDB' would be better

morgante11y ago

I like RoachDB. The idea is still there, but it's a little more fun.

angersock11y ago

Sure, but it goes down every April 20th...

jc_dntn11y ago

I prefer "CockDB".

jjoergensen11y ago

It reminds me of a female programmer that I once worked with. By mistake she had originally named one of our busiest databases "ClickCuntDB"

It was a database for counting the clicks on our website :-)

saraid21611y ago

Why? Because the database runs around like a chicken with its head cut off?

oalders11y ago

It's memorable and I think the meaning is pretty clear, but it did make me shudder as well. :)

jmspring11y ago

I like the name. I had a good idea of the goals when I saw that bit of the title without reading the rest...

donut2d11y ago

Have to agree. I have so many bad feelings attached to cockroaches that just seeing the name makes me sick to my stomach.

gresrun11y ago

Trademark considerations aside, I'd vote for TwinkieDB

Andrex11y ago

Or even better, TwinkDB.

gresrun11y ago

Ummm... http://en.wikipedia.org/wiki/Twink_(gay_slang)

increment_i11y ago

Gotta agree. Cringe inducing name when you first hear it, but I imagine it would become less so as you worked with it more and more. You'd be desensitized to it after a while I'd imagine.

orasis11y ago

Change the name. I get the joke, but it has an emotionally negative connotation that bosses will hate.

bdevine11y ago

That's the first thing I thought. If anybody from the team is looking, how about something like BlattoDB[0]?

[0] http://en.m.wikipedia.org/wiki/Blattaria

notduncansmith11y ago

Sounds a lot like "blotto", which is the last state I want my DB in.

bdevine11y ago

True, true. But the point still stands that there are options which hint at the durability of cockroaches without invoking disgust!

Actually this all does remind me of research on the tangible effect of disgust on products -- see [0]. That work studied physical contact, but it's easy to extrapolate from there.

[0] https://faculty.fuqua.duke.edu/~gavan/bio/GJF_articles/conta...

dmarlow11y ago

CroachDB

pnathan11y ago

Terrible name.

logn11y ago

CoachDB ?

aerialfish11y ago

Agreed. Take out the 'roach' part and watch its popularity skyrocket.

dmarlow11y ago

nice

dang11y ago

1. https://twitter.com/andybons/status/472458545154494465. The answer to that question, btw, is yes. Reposts of stories that have had significant attention are treated as dupes for about a year.

2. That's not a criticism of the submitter. We want to see original work on HN. But there ought to be some substance to it, as well as to the resulting discussion.

rb2k_11y ago

How would one communicate with this DB?

I'd love to see some API examples.

Meai11y ago

Also benchmarks comparing it to RethinkDb, Mongodb, and sql examples (supposedly this is NewSQL, how much SQL does it even support?) These questions are important

teraflop11y ago

Here's the transactional key/value store API: http://godoc.org/github.com/cockroachdb/cockroach/kv

And an RDBMS-like layer on top of it: http://godoc.org/github.com/cockroachdb/cockroach/structured

candybar11y ago

As for the name, which I agree is problematic as is, how about EntomoDB for entomos (insect)?

Edit: It's not problematic if success is not an objective. But if it is, choosing a name with such strong established negative connotations is not wise.

maaku11y ago

How is the name problematic? I knew exactly what they were saying and why when I saw it. If it were me I would have shortened it to RoachDB, but that's just marketing.

enraged_camel11y ago

>>How is the name problematic?

Most people are disgusted by cockroaches. I think that's a good enough reason to change the name, at least if you want the product to be taken seriously.

mahkoh11y ago

Friendly reminder that "Mongo" is a very offensive word in German. A "Mongo" is a person suffering from Down syndrome. CockroachDB is a walk in the park compared to MongoDB.

1 more reply

iLoch11y ago

I agree it would probably be a good idea to chop off the cock.

nawitus11y ago

How does it handle replication and the resulting conflicts?

teraflop11y ago

It uses strongly consistent replication, so there are no conflicts.

nawitus11y ago

teraflop11y ago

1 more reply

j / k navigate · click thread line to collapse