What does this mean? They steal company resources for themselves, or just configure things incompetently?
Over Christmas, everything died, and the brilliant sysadmin was on holiday. Nobody could get things going again for many days and so their entire SaaS business was failing. They lost a lot of business and trust as a result.
The sysadmin is now gone and they are back on AWS.
Your forming a larger dependency on a team lead against a custom system that now is a liability as new people come to the organization don't want to adopt an abandoned poorly understood project.
> entire SaaS business
> [ Unmentioned - Single Point of Failure Service dependent on a single admin ]
If you are fully accounting for vacation, training, sleep etc then you need a minimum of 5 admins for mission critical services. Now, you can engineer around this to reduce your staffing requirement but I wouldn't recommend going under 2 ever because accidents happen.
This business seemed one below that, without the engineering, and I would point to the mgmt, not the brilliant admin as the problem.
This story has nothing to do with AWS or on-prem.
It's a story about incompetent management allowing a single human point of failure. If they don't change that, they'll have the same problem wherever they go.
But if you want something reliable that I can spend 30 seconds writing some terraform for, it will take an entire infra team to set up and maintain it, not to mention an entire procurement process and now having to integrate a new supply chain just for a basic multi-az setup (probably without things like backups and still without basic features the cloud gives you automatically).