But then again now we consider physical systems, say a spaceship, which require critical capabilities and operational regimes, and ask if fallback fault management is really a 'bad idea'.
I read that as we work really hard to engineer crystalline fault lines vertically through our stack so the system has a nice clean single plane of fracture.
Given their track record of reliability and the unsubstantiated claims in the article, I can't even. In the real world, all the actions that have absolutely saved a system was an occurrence of fallback.
Having branch free code, one way to fail is nice from a reasoning perspective, and reasoning was more than one of the points brought up in the article. But reasoning is a goal that is different than reliability. I can use a reliable automatic transmission without reasoning about it.
Fallback fixes issues that failover doesn't. Rather put out a piece that encourages someone to not do something (sometimes this is important granted), encouraging folks to use immutability would be a larger global positive.
Immutability really does change everything.
https://cacm.acm.org/magazines/2016/1/195722-immutability-ch...
You should nip overloads in the bud, and not propagate them. Have backpressure be at the protocol level, and every node only deals with its neighbors.
In fact, I would go so far as to say that the main reason for these failures is because we have monolitic, global addressing systems like DNS or IP routing tables, which let me send spam email to anyone, or DDOS a site from many machines at once. It’s totally discontinuous.
What a good distributed system should have is be continuous in distributing capabilities. Each node can grant capabilities only to trusted neighbors, and revoke any that have been misused. Neighbors can then delegate some capabilities to others, or — if the node wants — forward an invitation to them, to become a neighbor.
That would also solve all the issues about “real names policy”, and other crap like that. It shouldn’t matter whether you are “the real” Bill Gates or not. Your email shouldn’t be accessible to the whole world.
And websites would also be stored using a FileCoin-type market, which recruits more machines as more readers SPEND MONEY using micropayments to access the files.
Right now micropayments aren’t feasible, so instead we essentially have the publishers pay for hosting and collect micropayments via subscriptions and bundles.
I think the conclusion in the article ("don't do fallback") is misguided. Fallback code is sketchy, but sometimes it is worth it to take the time to write well-audited, well-tested fallback code to ensure a system which has high availability requirements can survive dependencies which are less reliable.
Their very example -- airport notice boards -- is an example of someplace where fallback is needed. The thesis of the piece is that management of fallbacks is complicated and painful and thus increase the scope of failure, as you observed.
In other words: fallback is often but not always required, and if you can plan to avoid it it may be better for you, depending on your application.
The flight control systems of civil aircraft like the A320 has failback modes to handle hardware failures such as a failed angle-of-attack sensor
https://a320podcast.libsyn.com/flight-control-laws
The 737 MAX crashed because it didn't have fallback modes.
Engine Control Units in automobiles also have fallback modes. You shouldn't get stuck just because an oxygen sensor failed, even though that means the car will have trouble balancing clean emissions, performance and fuel efficiency.
Depends on context obviously but IME as a controls engineer, what you want is a failsafe, not a fallback.
AWS calls a fallback when you "use a different mechanism to achieve the same result." Failsafes are all about returning the system to a stable and controllable state - if you can salvage the result that's great, but if it takes flaring off $10,000,000 worth of distillate to stabilize the system that's fine too.