undefined | Better HN

0 points0dmethz5y ago0 comments

Unless I'm misunderstanding something the system did not perform as documented. It should have scaled, it didn't.

When a critical piece of infrastructure fails under massive load I'm not sure it it'll help much when you politely tell your engineers they fucked up for not anticipating it.

You learn lessons. Both Slack and AWS seem to have learnt lessons here.

0 comments

1 comments · 1 top-level

sokoloff5y ago

I agree with much of what you say, but if you change it to "It's Amazon's fault, not ours", that's where I diverge.

Slack did fuck up here, as evidenced by the outage and you seem to at least partially agree by the fact that Slack learned a lesson. Further, I think that "understanding how your system scales up from a low baseline to a high level of utilization (such as Black Friday/Cyber Monday for e-commerce, or special event launches, or a SuperBowl ad landing page)" is a standard, "par for the course" cloud engineering topic to be on top of nowadays.

j / k navigate · click thread line to collapse