Cassandra Hits One Million Writes Per Second on Google Compute Engine (opens in new tab)

(googlecloudplatform.blogspot.com)

81 pointsbgoldy12y ago39 comments

39 comments

32 comments · 10 top-level

outside123412y ago· 6 in thread

Is anyone actually using Google Compute Engine? I haven't bumped into anyone using it and would love to hear what the real world experience is with it.

t0mas8812y ago

I've tested it, network latency is by far not as good as for example Rackspace Cloud. And tool-support is very far behind Amazon and OpenStack. Basically Google Compute Engine is something that might have had a chance years ago but at this point in time it's quite pointless because it has no upsides compared to existing far better established solutions and is behind on features, documentation, tools and user-support.

And I expect that not to get better soon if nobody is using it...

ithkuil12y ago

> And I expect that not to get better soon if nobody is using it...

If the people running GCE see that people are not using it because it sucks at X, they will probably do their best to address X. So complaining is a good thing, especially if it contains enough information in order to understand the problem (e.g. latency from/to where? internal network or external network?)

dragonwriter12y ago

> And I expect that not to get better soon if nobody is using it...

If Compute Engine is Google exposing internal infrastructure as an additional product, its quite likely that lots of people are using it (and that it is strategically important for it to be more usable) even if few people are paying for it.

patrickaljord12y ago

Google Compute Engine actually has many features that makes it better than AWS http://yourstory.com/2013/12/google-compute-engine-better-th...

thrownaway242412y ago

Network latency to or from what?

Florin_Andrei12y ago

It's a lot easier to go from zero to proficient with GCE than it is with AWS. This has no relation with the fact that AWS has a bigger "menu".

arielweisberg12y ago· 4 in thread

I like Cassandra and I think it's something you should look at hard if what you want is an eventually consistent database because multi-DC and availability are important to the problem you are solving.

The blog post fails to make that case anywhere. I am guessing that is because it is an ad for GCE and not Cassandra.

I know these kinds of benchmarks are necessary to get and keep your widget on the map, but they always annoy me because I know N other ways to arrive at the same or better numbers with a different set of tradeoffs. Demonstrating scale out is not great for differentiation because it says nothing about the total cost or complexity of use.

Why is it that only GCE + Cassandra could have done a million of these particular kind of writes in a cost effective way?

I also feel like they are fudging how elasticity works. It's not instant. Bringing up a cluster of a given size is not the same as adding node at time to a running cluster to arrive at that size and actually receiving full benefit from the additional capacity. From the wording it sounds like the entire cluster was brought up at once.

Overall my beef is that the blog post fails to inform the reader and is mediocre as a benchmark of anything other than GCE. And let's not even start with the complete lack of random reads, does GCE offer SSD instances yet?

ivansmf12y ago

I ran those tests. One of the challenges in benchmarking is to pick the workload, so I chose one our customers do more often. For IO testing I prefer to run FIO tests, and for networking iperf, but unless you do this for a living it is hard to relate to microbenchmarks. It would also leave out the Java settings.

The reason I left the tests running for an hour is because at some point Cassandra gets into compactions, which are CPU and read intensive. The engine memory maps the files, so the IO subsystem backing the storage sees a lot of random IOs as page faults kick in. Streaming IOs get in during fsyncs. Again, easier to see using FIO.

The cluster was brought up at once, and I added the ramp up time on the chart so folks could see it.

I had three goals for this test: - Show low latency is possible with remote storage and proper capacity planning. This is why I used quorum commit, which forces the client to wait for at least two nodes to commit. - Share the settings I used. If you open the GIST and download the tarball you'll find all changes I did to the cassandra.yaml and cassandra-env.sh. This benefits our customers directly because it gives them a starting point. - Recommend that customers look at all samples, not only the middle 80% the stress tool reports. Otherwise the cluster looks much better than it is.

Thank you for the comments! I can assure you I read them, and I'll incorporate suggestions as possible.

arielweisberg12y ago

Thanks I missed the link to the tarball in the GIST. Now that I can see the workload I see that it is 100 million keys for the entire cluster of 330 nodes?

That is 8 terabytes (n1-standard-8) of RAM for 18.6 * RF gigabytes of data or am I wrong? The entire thing should fit in the page cache assuming compaction keeps up. There are no deletes so no tombstones. On an overwrite workload I don't know what the space amplification will be and whether you will actually run out of RAM for caching.

90% of people who read a benchmark aren't going to look at it the way I do. They don't have a cost model that says how fast things should be and how much they should cost. I do and when things are fitting in memory I have a different a different set of expectations. If I am looking at the wrong instance type please let me know.

For the workload you described Cassandra shouldn't be doing random IO. I would expect there to be three + N streams of sequential IO. The write ahead log, memtable flushing, compaction output, and N streams reading tables for compaction.

All the write IO can be deferred and rescheduled heavily because fsyncs are infrequent. Reads for compaction are done by background tasks and shouldn't effect foreground latency.

If read ahead is not working for compaction (and killing disk throughput) that may be something that needs to be addressed. Compaction should be requesting IOs large enough to amortize the cost of seeking. Page faults in memory mapped files don't stop read ahead and I think that the kernel will even detect sequential access and double read ahead. For a workload like this with no random reads you could configure the kernel to read ahead 2-4 megabytes and the page cache would probably absorb it fine.

For the workload you described the only fsyncs would be the log (every 10 seconds) and memtable flushes/compaction finishing and that would normally be so infrequent as to not move the needle on overall IO capacity (although obviously it consumes sequential IO). You set trickle fsync so we are still talking about 10 megabyte writes.

Granted there are many things I don't know about Cassandra. I've plumbed a lot of it, but it isn't my day job. Using a 24 gigabyte heap and a 600 megabyte young generation is questionable to me. I think Cassandra can do better, and I also think there are several tools that would do the same job with 10-20x less nodes by exploiting the fact that they never have to do random reads from disk.

1 more reply

_delirium12y ago

> The cluster was brought up at once, and I added the ramp up time on the chart so folks could see it.

Can any person really bring up a cluster of this size? On AWS, you need special permission to launch more than ~20 instances of one type, and it's granted only after you make a business case for it, which they won't grant to regular peons. With my regular Google account that doesn't have any kind of special approval, can I really launch 330 instances within a few minutes?

1 more reply

ivansmf12y ago

Here is the GIST link: https://gist.github.com/ivansmf/6ec2197b69d1b7b26153

primitivesuave12y ago· 4 in thread

In the end these are just eye-popping benchmarks. What really drives people to your platform is developer friendliness and the speed by which you can get started. Google App Engine has this, but there's not a lot of information floating around the media and blogosphere as there is for Heroku and DigitalOcean.

I use Google App Engine for auto scaling an API and it works brilliantly. Super easy to set up and develop on. But recently I started preferring DigitalOcean simply because there is a community that is constantly posting tutorials and answering questions. To me, that's more valuable than the distant prospect of handling 1M writes/second.

Erwin12y ago

FYI, the Digital Ocean tutorial writers get paid $50 to write them: https://www.digitalocean.com/company/blog/get-paid-to-write-...

Looks like a great way for DO to both build a sense community and also appear high in Google rankings when you search for something. I googled some Linux question earlier today, and ended up on a DO page -- refreshing their brand in my mind.

jeremydw12y ago

While this doesn't directly pertain to your comment, perhaps it's somewhat related to onboarding with GAE...

I've been developing on GAE since it was released in 2008 and recently became "Google Cloud Platform Certified".

I'm considering offering a "Google App Engine for Startups" class/workshop (at something like General Assembly). The class would primarily include details about the platform's architecture and best practices for building high-scalability apps (like, say, Snapchat) so your app can "scale without thinking twice." Do you think there'd be some interest in a class like this?

jfim12y ago

From organizing past events, it's much easier to just do it and have it flop (nobody's going to notice it if it does) than try looking if there's interest or not. Ask your target venue if you can do it, make sure it has some visibility (ie. make sure it's in the program), setup a slide deck or two and do it. Good luck!

oamoruwa12y ago

To the point about what would attract users to a platform, offering a developer friendly environment and quick ramp up time, I certainly agree.

The 1M writes/second would be a capability more specifically focused for users/developers interested in using already existing platforms such AWS's High Performance Computing platforms and the like.

t0mas8812y ago· 3 in thread

The "one million" number grabs some attention, but this isn't that special in my opinion. Doing 3333 writes/sec per node is not that hard with Cassandra, actually it can be much faster if you use a setup with fast local storage and split for example the data disk from the commit log. The Google article reads as if they used network storage and 1 volume per node, both are bad ideas for Cassandra as documented by Datastax.

jasonlotito12y ago

Keep in mind, it's not "One Million Writes Per Second," it's "One Million Writes Per Second on Google Compute Engine" with "Google Compute Engine" being the key point to the article.

The "one million writes per second" for Cassandra has been written about before (in this case, on AWS): http://techblog.netflix.com/2011/11/benchmarking-cassandra-s...

wodzu12y ago

It is worth noting GCE is more expensive now that AWS was back in 2011.

According to Netflix article the AWS experiment did run at a cost $561 per 2h, that is ~$280 per hour. Perhaps they were not utilized the cluster fully in those 2h in which case we should multiply the 1h test that performed 500k inserts per second, in that case the cost would be $182*2 = ~$365 per 1h.

GCE test did run at the cost of $330 per hour. Give or take few dollars difference if anything it's surprising GCE can do at roughly the same cost what AWS was capable of 2+ years ago.

Saying all that GCE guys did a great effort. I wonder though how much speed you can squeeze from AWS and at what cost now when AWS is sporting SSD disks.

1 more reply

thrownaway242412y ago

That post doesn't mention anything about tail latency, while the GCE thing does point to P95 latency < 100ms consistently, which is nice.

1 more reply

roncohen12y ago· 3 in thread

Did anyone actually care to see if that data can be read back? :)

https://dev.mysql.com/doc/refman/5.0/en/blackhole-storage-en...

hibikir12y ago

Unix admins have been using a far better technology for decades: Store in /dev/null, extract from dev/random. The write speeds are just as fast, but you will eventually get your results back. It is said that it'd be quicker to extract the next Game of Thrones novel from this system than to wait for George R R Martin to finish writing it.

ivansmf12y ago

This is hilarious - it would have saved me days of work. Here is the GIST with my step by step. https://gist.github.com/ivansmf/6ec2197b69d1b7b26153

userbinator12y ago

The BLACKHOLE storage engine supports all kinds of indexes.

Best part of that page.

capkutay12y ago· 2 in thread

Can anyone comment on Cassandra's performance versus HBase? HBase adds the complexity of dealing with the whole HStack and clustering can be a pain if you have no need to use HDFS/Zookeeper in the first place. Cassandra seems nice because its a single platform and each node is an equal member of the cluster, no need for designating HMaster, data nodes, etc. I was just wondering if Cassandra's widely regarded as less performant.

I know the true answer lies in my exact use cases and weeks of initial testing, but it would be nice to hear someone's opinion first.

coolsunglasses12y ago

Honestly, just google "HBase vs. Cassandra" and go from there.

Just keep in mind that between the two, Cassandra has improved a lot more than HBase has.

HBase can be an easy choice if you already have a Hadoop cluster and want to roll the results of Map-Reduce jobs into HBase keys.

The DataStax crew has a decent stack for turning batch/OLAP jobs into queryable keys.

If you have no need of that, want tunable consistency, favor write availability over read performance, then Cassandra might be a fit.

Just uh, don't pretend Cassandra clusters are necessarily trivial to manage just because they're homogenous.

IMHO: put off moving to any of these technologies as long as possible.

1 more reply

espeed12y ago

MapR M7 Tables removes much of the complexity and layers of HBase (http://www.mapr.com/products/m7).

nightshowerer12y ago

Similar but more detailed post from Netflix about 2 and a half years ago: http://techblog.netflix.com/2011/11/benchmarking-cassandra-s...

fooyc12y ago

That's only 3000 writes per VM per second.

Now I would like to see how many reads it can do at the same time. Cassandra does mostly-sequential writes, however most reads do couples of disk seeks.

oamoruwa12y ago

The 1M writes/second capability is a valuable benchmark in analyzing continuously variable and real-time information, similar to CFD analysis are high-throughput computational analysis where you end up dealing with multi-variate dimensions of data continuously in flux.

vithlani12y ago

These cloud-y benchmarks are a joke.

1. What is the point of benchmarking so many writes? Instead, give us the cost of write-read and a few use cases: analysis, data mining, etc. And what of the bandwidth costs?

2. Say you are running a moderately intensive application that needs to be up 10 hours a day. $330/hour x 10 = $3300. Multiply that by a few days and where does that leave you?

Suddenly having your own infrastructure with kit like this: http://www.fusionio.com/products in it is not such a bad idea! This also means you don't have to use any funny NoSQL products, can use standard programming, non cloudy infrastructure and a decent RDBMs setup that non Silicon-valley, non-hipster humans have a chance of understanding and reliably maintaining.

j / k navigate · click thread line to collapse

39 comments

32 comments · 10 top-level

outside123412y ago· 6 in thread

Is anyone actually using Google Compute Engine? I haven't bumped into anyone using it and would love to hear what the real world experience is with it.

t0mas8812y ago

And I expect that not to get better soon if nobody is using it...

ithkuil12y ago

> And I expect that not to get better soon if nobody is using it...

dragonwriter12y ago

> And I expect that not to get better soon if nobody is using it...

patrickaljord12y ago

Google Compute Engine actually has many features that makes it better than AWS http://yourstory.com/2013/12/google-compute-engine-better-th...

thrownaway242412y ago

Network latency to or from what?

Florin_Andrei12y ago

It's a lot easier to go from zero to proficient with GCE than it is with AWS. This has no relation with the fact that AWS has a bigger "menu".

arielweisberg12y ago· 4 in thread

The blog post fails to make that case anywhere. I am guessing that is because it is an ad for GCE and not Cassandra.

Why is it that only GCE + Cassandra could have done a million of these particular kind of writes in a cost effective way?

ivansmf12y ago

The cluster was brought up at once, and I added the ramp up time on the chart so folks could see it.

Thank you for the comments! I can assure you I read them, and I'll incorporate suggestions as possible.

arielweisberg12y ago

Thanks I missed the link to the tarball in the GIST. Now that I can see the workload I see that it is 100 million keys for the entire cluster of 330 nodes?

All the write IO can be deferred and rescheduled heavily because fsyncs are infrequent. Reads for compaction are done by background tasks and shouldn't effect foreground latency.

1 more reply

_delirium12y ago

> The cluster was brought up at once, and I added the ramp up time on the chart so folks could see it.

1 more reply

ivansmf12y ago

Here is the GIST link: https://gist.github.com/ivansmf/6ec2197b69d1b7b26153

primitivesuave12y ago· 4 in thread

Erwin12y ago

FYI, the Digital Ocean tutorial writers get paid $50 to write them: https://www.digitalocean.com/company/blog/get-paid-to-write-...

jeremydw12y ago

While this doesn't directly pertain to your comment, perhaps it's somewhat related to onboarding with GAE...

I've been developing on GAE since it was released in 2008 and recently became "Google Cloud Platform Certified".

jfim12y ago

oamoruwa12y ago

To the point about what would attract users to a platform, offering a developer friendly environment and quick ramp up time, I certainly agree.

The 1M writes/second would be a capability more specifically focused for users/developers interested in using already existing platforms such AWS's High Performance Computing platforms and the like.

t0mas8812y ago· 3 in thread

jasonlotito12y ago

Keep in mind, it's not "One Million Writes Per Second," it's "One Million Writes Per Second on Google Compute Engine" with "Google Compute Engine" being the key point to the article.

The "one million writes per second" for Cassandra has been written about before (in this case, on AWS): http://techblog.netflix.com/2011/11/benchmarking-cassandra-s...

wodzu12y ago

It is worth noting GCE is more expensive now that AWS was back in 2011.

GCE test did run at the cost of $330 per hour. Give or take few dollars difference if anything it's surprising GCE can do at roughly the same cost what AWS was capable of 2+ years ago.

Saying all that GCE guys did a great effort. I wonder though how much speed you can squeeze from AWS and at what cost now when AWS is sporting SSD disks.

1 more reply

thrownaway242412y ago

That post doesn't mention anything about tail latency, while the GCE thing does point to P95 latency < 100ms consistently, which is nice.

1 more reply

roncohen12y ago· 3 in thread

Did anyone actually care to see if that data can be read back? :)

https://dev.mysql.com/doc/refman/5.0/en/blackhole-storage-en...

hibikir12y ago

ivansmf12y ago

This is hilarious - it would have saved me days of work. Here is the GIST with my step by step. https://gist.github.com/ivansmf/6ec2197b69d1b7b26153

userbinator12y ago

The BLACKHOLE storage engine supports all kinds of indexes.

Best part of that page.

capkutay12y ago· 2 in thread

I know the true answer lies in my exact use cases and weeks of initial testing, but it would be nice to hear someone's opinion first.

coolsunglasses12y ago

Honestly, just google "HBase vs. Cassandra" and go from there.

Just keep in mind that between the two, Cassandra has improved a lot more than HBase has.

HBase can be an easy choice if you already have a Hadoop cluster and want to roll the results of Map-Reduce jobs into HBase keys.

The DataStax crew has a decent stack for turning batch/OLAP jobs into queryable keys.

If you have no need of that, want tunable consistency, favor write availability over read performance, then Cassandra might be a fit.

Just uh, don't pretend Cassandra clusters are necessarily trivial to manage just because they're homogenous.

IMHO: put off moving to any of these technologies as long as possible.

1 more reply

espeed12y ago

MapR M7 Tables removes much of the complexity and layers of HBase (http://www.mapr.com/products/m7).

nightshowerer12y ago

Similar but more detailed post from Netflix about 2 and a half years ago: http://techblog.netflix.com/2011/11/benchmarking-cassandra-s...

fooyc12y ago

That's only 3000 writes per VM per second.

Now I would like to see how many reads it can do at the same time. Cassandra does mostly-sequential writes, however most reads do couples of disk seeks.

oamoruwa12y ago

vithlani12y ago

These cloud-y benchmarks are a joke.

1. What is the point of benchmarking so many writes? Instead, give us the cost of write-read and a few use cases: analysis, data mining, etc. And what of the bandwidth costs?

2. Say you are running a moderately intensive application that needs to be up 10 hours a day. $330/hour x 10 = $3300. Multiply that by a few days and where does that leave you?

j / k navigate · click thread line to collapse