We host a number of our customers' database systems on us-east-1.
What worked well for us (https://aiven.io):
- Architecturally relying only to a few cloud provider services (only need VMs, disk, object storage)
- Upfront investment on being able to move services from one region to another without downtime
- Pre-existing tooling for easily (manually) reconfiguring backup destinations on the fly
- Not running everything on just AWS
What did not work so well:
- Backups should automatically reroute to a secondary backup site on N consecutive failures
- Alert spam, need more aggregation
- New failure mode: extremely slow EBS access, some affected VMs were kinda working, but very slowly: need to create a separate alert trigger for this