In a failure case, it should remove the failing config, not all of them.
Pretty hard thing to miss if you test for it with any level of basic unit test or similar.
Second bug: canary failure should prevent further propogation of the bad config.
A little more difficult to test with automated tests due to requiring a connection. It sounds like this was in fact tested, but the usage between the two bits of software was not tested. A good integration test would have caught this. But I wouldn't call that required. I would at least however think it was required that the use case of that particular code to be at least manually checked because, you know it's a feature for disaster prevention / recovery.
There was enough information to deduce this pretty easily. Although they did tend to glaze over it in the write-up, almost purposefully.
For all those spouting that this was a good postmortem, not really, it's a good covering of ones ass, a good spin, sidestepping the real root cause.
What has slas and "here take credits" got to do with a postmortem?
I'm not really sure why I got downvoted for this. The post mortem was good but it wasn't something I'd aim to strive for. I like gcloud and I'll keep using it but I find the response to this thing a little bit hard to swallow.