Most of what I'm saying is:
(1) Looking at individual point failures and saying "if you'd just fixed that you wouldn't have had an incident" is counterproductive; like Mr. Oogie-Boogie, every big distributed system is made of bugs. In fact, that's true of literally every complex system, which is part of the subtext behind Cook[1].
(2) I think people are much too quick to key in on the word "config" and just assume that it's morally indifferentiable from source code, which is rarely true in large systems like this (might it have been here? I don't know.) So my eyes twitch like Louise Belcher's when people say "config? you should have had a staged rollout process!" Depends on what you're calling "config"!