I agree with much of what you say, but if you change it to "It's Amazon's fault, not ours", that's where I diverge.
Slack did fuck up here, as evidenced by the outage and you seem to at least partially agree by the fact that Slack learned a lesson. Further, I think that "understanding how your system scales up from a low baseline to a high level of utilization (such as Black Friday/Cyber Monday for e-commerce, or special event launches, or a SuperBowl ad landing page)" is a standard, "par for the course" cloud engineering topic to be on top of nowadays.