Nah, my CI process was solid. This was proven in the field over the course of years.
> I don't mean to be pedantic... you need phased rollout
You don't need to be pedantic, but better to ask the question rather than assume that was all that I did. =) You have to realize that what I built, worked flawlessly. It wasn't easy either, took a lot of trial and error.
I did have a CIDR based rollout. I could specify down to the individual box that it would run a specific version. Or I could write "latest" to always keep certain boxes running on the latest build. This was another part of my testing, but ended up not being fully necessary because I had enough automated testing in CI that "latest" always worked.
> but it contains an update to the updater which bricks the updater?
This happened, so I wrote a lot of test code to make sure that would never happen again. My CI would catch that since I was E2E testing that it could actually run the upgrade process.
Once I implemented all of this, I never had a single failure and would routinely, several times a day, deploy to the entire cluster, over the course of a couple years.
It was all eventually consistent as I could also control the "check for update" frequency as well.