> or some fun bare metal + PXE combination
This is actually what I implemented for our hypervisor tier, it’s not as scary as it sounds. I could legit completely rebuild our entire stack down to the metal in about 3 hours.
Kick off a new hypervisor version, the inactive side PXE boots all the nodes, installs and configures a Proxmox cluster, slaves itself to our Ceph cluster, and then either does a hot migration of all the VMs or kicks off a full deploy which rebuilds all the infra (Consul, Rabbit, Redis, LDAP, Elastic, PowerDNS, etc) along with the app servers. The hardest part (which really isn’t) is maintaining the clusters across the blue/green sides.
With this setup our only mutable infrastructure was our Ceph cluster (because replacing OSDs takes unacceptably long) and our DB (for performance the writers lived on dedicated servers, the read replicas lived on the VMs.).