Netflix designed their stuff from the ground up to fail over. Large monolith corporations who've inherited systems from other companies they've bought or merged with have challenges you won't see many places that have benefited from the 30 years of lessons that were taught at these companies.
No, it can't. Any loss of customer-facing functionality is a critical outage ("World Problem" in company terminology). There are a relatively small number of customers, but the terminal is critical to the operations of those who buy it. The terminal going down for eight hours would be a world-wide headline in the financial press.
A Tier 1 test that simulates loss of a datacenter takes a cluster one DC virtually offline. This puts an entire subset of services offline in that DC entirely. The test is coordinated with the teams who own the services to ensure their services fail over correctly. Any service disruption during the failover is a test failure. If it passes, the customers don't even know it happened. The goal is to be able to lose an entire DC and have the terminal customers not realize it until they hear about it on the news.
Do you know what Bloomberg does? It powers equities trading markets around the world, 24/7. It isn't just news.
Chaos engineering and AWS weren't real things when they started building the company. And the system they have now doesn't resemble much of it was once.
Truth of the matter is they invested more in their infrastructure, but that's because their business plan required them to grow on the back of technological advances. Banks, it's seems, do not. Or maybe they do, and the some of these start up banks will usurp them.
But I’m guessing wellsfargo just doesn’t have a reason to care.
You can bail out of a test at the first sign of trouble. When a real outage hits, there’s no telling how long it will take to recover.