I wonder how much we spend babysitting slop (either directly or indirectly through SaaS) vs how much it would cost to invest in engineering around automated and reliable deployment and resilient software?
Highly polished distributed file systems, databases, and orchestration tools that incorporated automatic replication and fail over would be a great start.
Don’t just say “it’s hard.” Of course it is. But is it cheaper to do this or to babysit rickety piles of junk?
Another possible way out is AI sysadmins. I wonder how far we are from AIs that can admin a cluster including upgrades and disaster recovery?