Coffee and Design for Failure (opens in new tab)

(somic.org)

11 pointssomic15y ago8 comments

8 comments

8 comments · 5 top-level

sklivvz197115y ago· 1 in thread

I don't think the analogy is correct: suppose that John needed to buy life saving medicines. Would his course of action be reasonable and rational? No, because he would be dead. On mission critical systems you make damn well sure in advance you have a backup plan (or more if needed). John should have bought an extra set of medicines, and AWS should have had better backup plans - it seems to me that if it's taking so long to bring the systems up, there must be something not really well thought of there.

somicOP15y ago

The post was not directed at AWS. A "normal accident" (one that can't be foreseen, usually due to confluence of factors and bad luck) happened and they are dealing with it.

My post was directed at folks who said "web sites that went down as a result of such unprecedented EC2 problem made an engineering mistake by not building to be able to withstand."

Again - I am emphasizing "engineering mistake", not business mistake or funding mistake or resource allocation mistake.

My point is there are things you rationally protect against. But at some point, putting up defenses against more and more bad things stops being rational.

For different systems this point (where it stops being rational) is different.

ctdonath15y ago· 1 in thread

Context, please: what's "the Judgment Day Outage"?

(I presume knowing that would explain the purpose of the gedankenexperiment.)

archgoon15y ago

Amazon's EC2 Failures. Since April 21st 2011, is the date that the fictional Skynet (from Terminator) seizes control of the worlds computing infrastructure, some have been amused by the timing coincidence.

cmurdock15y ago· 1 in thread

John doesn't make money based on whether or not he gets a cup of coffee, so this whole question is worthless in my opinion.

archgoon15y ago

It is not overly difficult to transform this story about one form of utility (Starbucks coffee) to being about needing to get to a location to perform work (another form of utility).

alanh15y ago

Interesting such a question-filled post doesn’t have comments enabled. (Not that I’m criticizing it for this reason.)

I guess I’m not sure how relevant the parable is, because while it doesn’t cost have coffee & a pot at your house — and you will use it anyway, and it doesn’t cost anything day-to-day to just keep it there — this isn’t true for, say, having a backup system just waiting to take over in the event of catastrophic failure.

Then again, the Netflix strategy (3 clusters at 60% utilization → 2 clusters at 90%) is superficially similar to usually having an option of homebrew, walking, and driving for coffee, and sometimes just being forced to pick one option. Superficially.

isak215y ago

For all John Doe knew, those events (road closed, rain + strong wind) were unlikely to happen, so planning for them would not have been rational. He did bring an umbrella when he saw a cloudy sky, which demonstrates good planning (even though it didn't work out in this particular case).

j / k navigate · click thread line to collapse

8 comments

8 comments · 5 top-level

sklivvz197115y ago· 1 in thread

somicOP15y ago

The post was not directed at AWS. A "normal accident" (one that can't be foreseen, usually due to confluence of factors and bad luck) happened and they are dealing with it.

My post was directed at folks who said "web sites that went down as a result of such unprecedented EC2 problem made an engineering mistake by not building to be able to withstand."

Again - I am emphasizing "engineering mistake", not business mistake or funding mistake or resource allocation mistake.

My point is there are things you rationally protect against. But at some point, putting up defenses against more and more bad things stops being rational.

For different systems this point (where it stops being rational) is different.

ctdonath15y ago· 1 in thread

Context, please: what's "the Judgment Day Outage"?

(I presume knowing that would explain the purpose of the gedankenexperiment.)

archgoon15y ago

cmurdock15y ago· 1 in thread

John doesn't make money based on whether or not he gets a cup of coffee, so this whole question is worthless in my opinion.

archgoon15y ago

It is not overly difficult to transform this story about one form of utility (Starbucks coffee) to being about needing to get to a location to perform work (another form of utility).

alanh15y ago

Interesting such a question-filled post doesn’t have comments enabled. (Not that I’m criticizing it for this reason.)

isak215y ago

j / k navigate · click thread line to collapse