You could just distribute your workloads using...a queue, and not have this problem, or have to pay for and pay to maintain backup equipment etc.
The point here is that 99% of companies are not in that scenario, so they should not emulate the very expensive distributed architectures used by Google and a few other companies that ARE in that scenario.
For almost all companies on the smaller side, the correct move is to take the occasional downtime, because the tiny revenue loss will be much smaller than the large and ongoing costs of building and maintaining a complex distributed system.
From the post directly above: “Most businesses…”
The thread above is specifically discussing business which won’t lose a significant amount of money if they go down for a few minutes. They also postulate that most businesses fall into this category, which I’m inclined to agree with.
In your typical seed, series A, or series B SaaS startup, this is most often not the case. At the same time, these are the companies that fueled the proliferation of microservice-based architectures, often with a single-point of failure in the message queue or in the cluster orchestration. They shifted easy-to-fix problems into hard-to-fix problems.
Loads of software issues, of course.
I know this is just an anecdote, but I'm pretty certain reliability has increased by one or two orders of magnitude since the 90s.
When I worked with small firms who used kubernetes, we had more kubernetes code issues that machines failing. The solution to the theoretical problem was the cause of real issues. It was expensive to keep fixing this.