A) Make it "MORE complex" - go model predictive for example as suggested by OP (or whatever). Now that your PID is a gain scheduled, asymmetric, dead-zoned beast, maybe the difference between more complex systems and PID seems less daunting.
B) Make it simpler! Just effectively make it bang bang (or pure P) with a deadzone. Leave some performance on the table, but gain the confidence that less will probably go wrong.
C) Double down on PIDs. Gain scheduling is fun!. You can figure out how to constrain your system, and carve out regions of LTI goodness and be confident in your transitions.
These are all valid solutions. As a lazy engineer, I think B) should be the first choice of any business. And honestly, I think that's where a lot of real businesses ended up.
For a mechanism to scale server farm, it cannot be too surprising to engineers that work on it. They need to be able to understand what it does more or less to be able to figure out what is going and deal with problems.
This is even more important in corner cases. What is going to be startup behavior? What happens when a large number of servers goes down? Etc. These behaviors can be very rarely encountered but engineers must be able to predict and reason about them.