Reddit: 2012 State of the Servers (opens in new tab)

(blog.reddit.com)

217 pointsblutonium14y ago56 comments

56 comments

35 comments · 11 top-level

joevandyk14y ago· 9 in thread

They say they moved off ebs and onto local storage for postgres and saw a big increase in reliability and performance.

I did the same for my site last year and it was great.

This is one of the reasons why I haven't moved my Postgres databases to enterprisedb or heroku: they use ebs.

tibbon14y ago

By local storage, do they mean they are running their DB disks on physical hardware, but the database servers still on EC2?

rogerbinns14y ago

The Amazon servers have local disks physically attached. They are wiped between customers, on machine failure etc hence "ephemeral". The EBS (elastic block storage) is accessed as a disk but is over the other end of a network connection. Amazon does more to ensure the contents are available and durable (eg replication, backup to S3). The problem with EBS is that performance especially latency is highly variable and unpredictable.

x3c14y ago

But how do you achieve data persistence in case of server crash? Snapshots are not reliable for that, slave db servers aren't foolproof either.

ruckusing14y ago

We have the same setup, use local ephemeral disks on EC2 with Postgres. We never even tried EBS as we just heard too much negative things about it, namely its variance in performance.

So our approach is to RAID-10 (4) local volumes together. We then use replication to at least 3 slaves, all of which are configured the same and can become master in the event of a failover.

We use WAL-E[0] to ship WAL logs to S3. WAL-E is totally awesome. Love it!

[0] https://github.com/heroku/wal-e

1 more reply

cheald14y ago

Lots and lots of replication, more or less. I don't know how it works in Postgres, but with something like Mongo, you set up a replication cluster and presume that up to (half - 1) of the nodes can fail and still maintain uptime. Postgres, being a relational database rather than a document store, likely has an additional set of challenges to overcome there, but it's very possible to do.

fleitz14y ago

You just copy the WAL log to another server and replay it. It takes a day to setup and test. Once that is setup you have two options async replication (which means you'll lose about 100ms of data in event of a crash) or you can use sync replication which means the transaction doesn't commit until the WAL log is replicated on the other server. (that adds latency but doesn't really affect throughput)

I'm not exactly sure how the failover system works in Postgres, the last time I setup replication on postgres it would only copy the WAL log after it was fully written, but I know they have a much more fine grained system now.

If you use SQL Server you can add a 3rd monitoring server and your connections failover to the new master pretty much automatically as long as you add the 2nd server to your connection string. Using the setup with a 3rd server can create some very strange failure modes though.

2 more replies

jarcoal14y ago

What a bummer. Amazon advertises EBS as being both faster and more reliable, but it sounds like they are delivering neither.

iand14y ago

By reliable Amazon mean "won't lose your data" and they deliver on that. The issue in the articlew is around latency and Amazon aren't making any claims in that area. High throughput databases need steady latency guarantees so they're not a great fit for EBS. EBS is great for many other scenarios though.

cheald14y ago

EBS is fine until it isn't. The problem isn't general EBS suckage, it's unpredictable and sporadic suckage. When your DB server is blocking while it tries to write to a disk that isn't responding, things get really hairy really quickly.

markerdmann14y ago· 5 in thread

It's interesting to see that they're sticking with Cassandra, and that they're having a much better experience with 0.8. I've been hearing so many fellow coders in SF hate on Cassandra that I had stopped considering it for projects. Has anybody worked with 0.8 or 1.0? Would you recommend Cassandra?

I got to work with Riak a lot while I was at DotCloud, but the speed issue was pretty frustrating (it can be painfully slow).

rbranson14y ago

This is because people came to the table with unrealistic expectations. They were used to dealing with mature software based on decades old proven ideas and coming into very experimental territory expecting to get a smooth experience.

Cassandra has enabled Reddit to manage a highly scalable distributed data store with a tiny staff. This is not to say it has been trouble free, but it has enabled them to do something that would have been infeasible without pioneers in this space (Cassandra, Riak, Voldemort, etc) making these tools available.

onemoreact14y ago

I respect the Reddit team, but I don't think they need to use Cassandra at their scale. I mean they only have 2TB of data in total. They should easily be able to use a simple caching system to keep the last 2 weeks of data in RAM and basically never read from the database.

That said, they may be freaked out based on their growth curve and simply thinking ahead.

1 more reply

henrikschroder14y ago

The one great thing with Cassandra is how easy it is to expand your cluster. You just start a new server up, point it to the existing cluster, and it automatically joins it, streams the sharded data it should have to itself, and start serving requests.

Balancing your cluster requires a little bit more handholding, and if something goes wrong or you fuck it up, it can be pretty challenging. But most of the time it's pretty painless.

There are a lot of other warts though, the data model is slightly weird, the secondary indexing is slow, and eventual consistency is hard to wrap your head around, but it doesn't require much effort to run and operate a large cluster, and if that's important to you and your application, you should check it out.

The NoSQL space is pretty interesting, but there is no clear winner, each of the competing solutions have their own niche, their own specialities, so it's impossible to give general recommendations right now.

shin_lao14y ago

That's what we hear from our customers as well. They complain about excessive CPU and memory usage.

The two phases we've seen are:

1/It's flexible and it works! Problem solved! 2/21st century called, they want their performance back.

The problem with phase 2 is that you may not be able to solve it by throwing more computing power at it.

Unfortunately if you really need map-reduce, at the moment I don't know what to recommend. Riak isn't better performance-wise and our product doesn't support map-reduce (yet).

However if you don't need map-reduce I definitively recommend not using Cassandra. There's a lot of non-relational databases out there that are an order of magnitude faster.

jbellis14y ago

Be careful to compare apples to apples. Sure, the memory-only crowd (e.g, redis) will post higher numbers, but Cassandra is the performance leader for scalable, larger-than-memory datasets. See http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/ for example. (And this tests an old version of Cassandra; we did a lot of optimization on the read path for 1.0: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-...)

1 more reply

cluda0114y ago· 5 in thread

I'm unfamiliar with hosting costs or really any costs running a site as popular as reddit. Anyone with experience in this area have a ballpark figure for how much it would cost per month to run this sort of setup?

davej14y ago

My back of the envelope estimate. These are based on the figures from last year and the fact that they currently have 240 EC2 instances, some are large (guessed 70), more are x-large (guessed 170).

8760 is the number of hours in a year.

(8760 * $0.24 * 170) + (8760 * $0.12 * 70) = $430,992/yr in hourly fees

($1,820 * 170) + ($910 * 70) = $373,100/yr in reservation fees

373,100 + 430,992 = 804,092 / 12 months = $67,007.67/mo

Reference for last years calculations: http://www.reddit.com/r/blog/comments/ctz7c/your_gold_dollar...

cheald14y ago

Nearly $1million/year in infrastructure costs so that I can laugh at GIFs of cats.

The internet is truly a wonderous thing.

rdouble14y ago

$300K

someone1314y ago

Where do you get this estimate from? (Not disbelieving you, just curious)

4 more replies

cmer14y ago

There's no way in hell it costs $300k per month to run Reddit!

1 more reply

ypcx14y ago· 2 in thread

Wondering how much of that 2TB dataset is necessary for the common daily functionality of reddit, probably less than 1%, and the rest is historical data, accessed by almost no one, except perhaps by the submission-dupe- checking algorithms, and similar?

rplnt14y ago

Are you suggesting moving that down the ladder and not having that data everywhere? I.e. if someone want to see an old post, there would be one extra step required to load the data (so cdn, cassandra, now subset of postgres with not so old data, "full" postgres). I think facebook does something similar, but they really have to, considering their size.

nbm14y ago

In terms of status updates (ie, stories which may mention check-ins or photos or similar, but not Facebook Messages), before Facebook's Timeline launch, there were multiple stores of data depending on age. With Timeline, all the different versions of data over all ages were put back together into a single (logical) store. More about that process at:

    https://www.facebook.com/notes/facebook-engineering/building-timeline-scaling-up-to-hold-your-life-story/10150468255628920

brador14y ago· 1 in thread

Could we get a public backup of the database already? Make it a torrent if bandwidth is an issue, but lets back that amazing resource up.

obtu14y ago

Terabytes is starting to be expensive to mirror, in terms of bandwidth and storage.

ctekin14y ago· 1 in thread

Does anyone know what kind of hardware those 240 servers have? I wonder how much they cost.

davej14y ago

They're EC2 large and x-large instance.

Ecio7814y ago· 1 in thread

What about IndexTank? They dont talk about it in this blog post. Have they stopped using it?

eco14y ago

They still use it. Whenever you search you get the "Powered by IndexTank" logo in the corner.

gameshot91114y ago

Having no experience with database/website administration myself, I'm struck by just how little I'm able to translate the works and concepts in this post into actual, manual labor.

For each and every thing that Jason talked about...upgrading Cassandra, moving off EBS, embarking on self-heal and auto-scale projects...what took the reader a few seconds to read and cognise undoubtedly represented hours and hours of work on the part of the Reddit admins.

I guess it's just the nature of the human mind. I don't think I could ever fully appreciate the amount of work that goes into any project unless I've been through it myself (and even then, the brain is awesome at minimizing the memory of pain). So Reddit admins, if you're reading this, while I certainly can't fully appreciate the amount of labor and life-force you've dedicated to the site, I honestly do appreciate it, and I wish you guys nothing but success in the future!

thought_alarm14y ago

It reminds me of Slashdot circa 1998/99, back when we watched those guys grow their then-new-found popularity out of a dorm-room Linux box; at a time when the web was a mere fraction of the size it is today.

Godspeed, reddit. You're on the right track.

zerostar0714y ago

Those are staggering numbers, glad i invested my time in reddit last year. We must be cautious of overheating though, signs of a bubble or a possible subreddit crisis.

fleitz14y ago

Running a DB on a single spindle, and they have performance problems?

I couldn't imagine why.

2 TB OMG, thats almost a decent sized SQL Server instance. Yeah, it should take about an hour or two to replicate. I'm assuming they have a 10Gb enet on their DB server.

j / k navigate · click thread line to collapse

56 comments

35 comments · 11 top-level

joevandyk14y ago· 9 in thread

They say they moved off ebs and onto local storage for postgres and saw a big increase in reliability and performance.

I did the same for my site last year and it was great.

This is one of the reasons why I haven't moved my Postgres databases to enterprisedb or heroku: they use ebs.

tibbon14y ago

By local storage, do they mean they are running their DB disks on physical hardware, but the database servers still on EC2?

rogerbinns14y ago

x3c14y ago

But how do you achieve data persistence in case of server crash? Snapshots are not reliable for that, slave db servers aren't foolproof either.

ruckusing14y ago

We have the same setup, use local ephemeral disks on EC2 with Postgres. We never even tried EBS as we just heard too much negative things about it, namely its variance in performance.

So our approach is to RAID-10 (4) local volumes together. We then use replication to at least 3 slaves, all of which are configured the same and can become master in the event of a failover.

We use WAL-E[0] to ship WAL logs to S3. WAL-E is totally awesome. Love it!

[0] https://github.com/heroku/wal-e

1 more reply

cheald14y ago

fleitz14y ago

2 more replies

jarcoal14y ago

What a bummer. Amazon advertises EBS as being both faster and more reliable, but it sounds like they are delivering neither.

iand14y ago

cheald14y ago

markerdmann14y ago· 5 in thread

I got to work with Riak a lot while I was at DotCloud, but the speed issue was pretty frustrating (it can be painfully slow).

rbranson14y ago

onemoreact14y ago

That said, they may be freaked out based on their growth curve and simply thinking ahead.

1 more reply

henrikschroder14y ago

Balancing your cluster requires a little bit more handholding, and if something goes wrong or you fuck it up, it can be pretty challenging. But most of the time it's pretty painless.

shin_lao14y ago

That's what we hear from our customers as well. They complain about excessive CPU and memory usage.

The two phases we've seen are:

1/It's flexible and it works! Problem solved! 2/21st century called, they want their performance back.

The problem with phase 2 is that you may not be able to solve it by throwing more computing power at it.

Unfortunately if you really need map-reduce, at the moment I don't know what to recommend. Riak isn't better performance-wise and our product doesn't support map-reduce (yet).

However if you don't need map-reduce I definitively recommend not using Cassandra. There's a lot of non-relational databases out there that are an order of magnitude faster.

jbellis14y ago

1 more reply

cluda0114y ago· 5 in thread

davej14y ago

My back of the envelope estimate. These are based on the figures from last year and the fact that they currently have 240 EC2 instances, some are large (guessed 70), more are x-large (guessed 170).

8760 is the number of hours in a year.

(8760 * $0.24 * 170) + (8760 * $0.12 * 70) = $430,992/yr in hourly fees

($1,820 * 170) + ($910 * 70) = $373,100/yr in reservation fees

373,100 + 430,992 = 804,092 / 12 months = $67,007.67/mo

Reference for last years calculations: http://www.reddit.com/r/blog/comments/ctz7c/your_gold_dollar...

cheald14y ago

Nearly $1million/year in infrastructure costs so that I can laugh at GIFs of cats.

The internet is truly a wonderous thing.

rdouble14y ago

$300K

someone1314y ago

Where do you get this estimate from? (Not disbelieving you, just curious)

4 more replies

cmer14y ago

There's no way in hell it costs $300k per month to run Reddit!

1 more reply

ypcx14y ago· 2 in thread

rplnt14y ago

nbm14y ago

    https://www.facebook.com/notes/facebook-engineering/building-timeline-scaling-up-to-hold-your-life-story/10150468255628920

brador14y ago· 1 in thread

Could we get a public backup of the database already? Make it a torrent if bandwidth is an issue, but lets back that amazing resource up.

obtu14y ago

Terabytes is starting to be expensive to mirror, in terms of bandwidth and storage.

ctekin14y ago· 1 in thread

Does anyone know what kind of hardware those 240 servers have? I wonder how much they cost.

davej14y ago

They're EC2 large and x-large instance.

Ecio7814y ago· 1 in thread

What about IndexTank? They dont talk about it in this blog post. Have they stopped using it?

eco14y ago

They still use it. Whenever you search you get the "Powered by IndexTank" logo in the corner.

gameshot91114y ago

Having no experience with database/website administration myself, I'm struck by just how little I'm able to translate the works and concepts in this post into actual, manual labor.

thought_alarm14y ago

Godspeed, reddit. You're on the right track.

zerostar0714y ago

Those are staggering numbers, glad i invested my time in reddit last year. We must be cautious of overheating though, signs of a bubble or a possible subreddit crisis.

fleitz14y ago

Running a DB on a single spindle, and they have performance problems?

I couldn't imagine why.

2 TB OMG, thats almost a decent sized SQL Server instance. Yeah, it should take about an hour or two to replicate. I'm assuming they have a 10Gb enet on their DB server.

j / k navigate · click thread line to collapse