story

Don't blame Amazon for your lack of redundancy (opens in new tab)

cloudability.com

86 pointsmatellis15y ago21 comments

21 comments

But mostly it’s because the cloud has been so darned reliable.

Or because the cloud has offered such a good value proposition for cheap and easy scaling with demand.

And anyway, it's unfair to rail on people who "should have had a redundancy plan" when the service they pay money for is one with a redundancy service included in it (availability zones) which has unexpectedly also failed.

Our point stands, for engineers to consider all likely scenarios when building redundancy and not assume anyone – even Amazon – can provide 100.0% uptime.

Your point appears to be "If you don't have 100% uptime then it's all your fault and you should have planned for it you lazy idiot, everyone should blame you. Also you can never have 100% uptime so people should stop blaming Amazon.". Do you have more of a point than that?

matellisOP15y ago

Our point is it's unfair to blame Amazon 100% when there were viable and affordable ways round this outage. Some of our customers were in US East and had off-site backups, scripted server deployments and were able to get back online pretty quickly.

jodrellblank15y ago

How were they able to get back online? Deploying to another AWS site or to another cloud provider? Presumably they didn't own enough servers themselves?

At what point did they make the call that the outage was too serious and they would lose all data since the last backup and start migrating? Had they pre-planned for it, or was it ad-hoc? Will they stay where they are now and use Amazon US East as their failover or migrate back in due course? Or rearchitect to handle this in future?

(thanks, re: name).

Is it just luck that the problem which happened was one they prepared for instead of one they didn't?

matellisOP15y ago

Cool nick btw.

maratd15y ago

Cloud = Virtualization. Creating redundancy within the same virtual ecosystem is idiotic, as the article points out. Any hardware or virtualization software failure would throw your redundancy out the window. You need to have your virtual eggs in very different baskets, in different geographical locations, with different providers.

matellisOP15y ago

+1 We agree.

mtogo15y ago

There's an upvote button for that.

zzleeper15y ago

As the author of the article, I think they are allowed this type of replies...

1 more reply

itswindy15y ago

Your own servers go down too. And not every site can afford triple redundancy. Sometimes you just stay down for a few hours.

matellisOP15y ago

+1 Sometimes it's not cost effective to do all the layers. One of our clients is going to lose $2500 to $3000 today, which is less than they would have spent on avoiding the issue in the first place.

cabalamat15y ago

But do blame Amazon for Amazon's lack of redundancy.

fvbock15y ago

i agree.

AZs - according to Amazon, "are distinct locations that are engineered to be insulated from failures in other Availability Zones."

that did not really seem to work.

if you're deployed in one zone and shit hit's the fan: "your fault". if you assume amazon does as advertised and live in several AZ and these go down apparently more or less at the same time: 'amazons fault'...

i read that amazon plans to post a 'postmortem' on this... i'd be really eager to know how AZs are actually designed/sperated. not to be able to point fingers (maybe just a little bit), but to just _know_ where i am deploying stuff to...

matellisOP15y ago

This is fair comment, especially given that they actually emphasized this multi-AZ redundancy.

The genesis of the article was the press implying that to use the cloud your only choice was to trust AWS provided 100% up-time, and this is a position we disagree with.

MichaelApproved15y ago

All hosts are vulnerable to outages, be it in the cloud, typical shared hosting or self managed racks in a data center.

Shit is going to happen with your host. To say it's a problem with the cloud is unfair.

stormental15y ago

The zeitgeist is that the cloud equals redundacy yet that's not necessarily the case. You still need to prep for contingencies...

dpcan15y ago

It is redundant to a degree, but there are varying layers of redundancy.

In this case, it looks like most users of AWS are in need of more geographic redundancy, but in terms of localized data redundancy (term?), it appears AWS is a pretty solid solution.

stormental15y ago

That's totally right: it's a pretty solid solution and frankly, even with the outage, it's still better than the alternatives based on cost and difficulty of configuration.

matellisOP15y ago

Completely agree.

pippy15y ago

Isn't the primary benefit in cloud servies reliability? If one of the largest cloud services has an Achilles heel, doesn't that defeat its purpose?

I was under the impression if one of the clusters were to be unavailable, the nearest mirror would resume responsibility. This should include distribution services as well.

bilmeswe15y ago

AWS is great for running a redundant cache or backup solution.

j / k navigate · click thread line to collapse

21 comments

jodrellblank15y ago

But mostly it’s because the cloud has been so darned reliable.

Or because the cloud has offered such a good value proposition for cheap and easy scaling with demand.

Our point stands, for engineers to consider all likely scenarios when building redundancy and not assume anyone – even Amazon – can provide 100.0% uptime.

matellisOP15y ago

jodrellblank15y ago

How were they able to get back online? Deploying to another AWS site or to another cloud provider? Presumably they didn't own enough servers themselves?

(thanks, re: name).

Is it just luck that the problem which happened was one they prepared for instead of one they didn't?

matellisOP15y ago

Cool nick btw.

maratd15y ago

matellisOP15y ago

+1 We agree.

mtogo15y ago

There's an upvote button for that.

zzleeper15y ago

As the author of the article, I think they are allowed this type of replies...

1 more reply

itswindy15y ago

Your own servers go down too. And not every site can afford triple redundancy. Sometimes you just stay down for a few hours.

matellisOP15y ago

+1 Sometimes it's not cost effective to do all the layers. One of our clients is going to lose $2500 to $3000 today, which is less than they would have spent on avoiding the issue in the first place.

cabalamat15y ago

But do blame Amazon for Amazon's lack of redundancy.

fvbock15y ago

i agree.

AZs - according to Amazon, "are distinct locations that are engineered to be insulated from failures in other Availability Zones."

that did not really seem to work.

matellisOP15y ago

This is fair comment, especially given that they actually emphasized this multi-AZ redundancy.

The genesis of the article was the press implying that to use the cloud your only choice was to trust AWS provided 100% up-time, and this is a position we disagree with.

MichaelApproved15y ago

All hosts are vulnerable to outages, be it in the cloud, typical shared hosting or self managed racks in a data center.

Shit is going to happen with your host. To say it's a problem with the cloud is unfair.

stormental15y ago

The zeitgeist is that the cloud equals redundacy yet that's not necessarily the case. You still need to prep for contingencies...

dpcan15y ago

It is redundant to a degree, but there are varying layers of redundancy.

In this case, it looks like most users of AWS are in need of more geographic redundancy, but in terms of localized data redundancy (term?), it appears AWS is a pretty solid solution.

stormental15y ago

That's totally right: it's a pretty solid solution and frankly, even with the outage, it's still better than the alternatives based on cost and difficulty of configuration.

matellisOP15y ago

Completely agree.

pippy15y ago

Isn't the primary benefit in cloud servies reliability? If one of the largest cloud services has an Achilles heel, doesn't that defeat its purpose?

I was under the impression if one of the clusters were to be unavailable, the nearest mirror would resume responsibility. This should include distribution services as well.

bilmeswe15y ago

AWS is great for running a redundant cache or backup solution.

j / k navigate · click thread line to collapse