People are just unaware, and probably making bad calls in the name of being "portable".
I'm curious who these web companies are.
Use something like Lambda and you get multi-az for free.
https://docs.aws.amazon.com/lambda/latest/dg/security-resili...
Dynamo is another service that wouldn't be impacted as it is multi-az.
Getting postgres RDS multi-region would require the extra couple of lines in your CDK, but is fairly straightforward.
At least 1 cluster had a node on “affected” hardware (per AWS). Aurora failed to failover properly and the cluster ended up in a weird error state, requiring intervention from AWS. Could not write to the db at all. This took several hours to resolve.
All that to say that it’s never straightforward. In today’s event, it was pure luck of the draw as to whether a multi-AZ Aurora cluster was going to have >60 seconds of pain.
That SaaS has been running Aurora for years and has never experienced anything similar. I was very surprised when I heard the cluster was in a non-customer-fixable state and required manual intervention. I’ve shilled Aurora hard. Now I’m unsure.
Thank goodness they had an enterprise support deal or who knows if they’d still have issues now.
Want GKE to run multi-zone, or Spanner to run multi-region, just check a box (and insert coin).
Not all systems require high availability. Some systems are A-OK with downtime. Sometimes, I'm perfectly fine with eventual consistency. You really do have to look at the use-cases and requirements before making sweeping staements.