Now the systems are stable but human workers either be sick, leave, or die eventually.
Rising the pay has diminishing returns. You can't prevent workers leaving because of lost of interests, be sick or die by throwing more money at them.
The article wrote about achieving stability by the distributed system so an unexpected death of one rack doesn't affect the service availability. The same can be done for the human workers unexpectedly not working anymore. Have a multiple workers doing the same things improve stability.
Sure, it's inefficient in terms of money. But alternative is one sick important employee catch a COVID-19 and die lost the knowledge of the system. Documents doesn't solve it because you want the manual operation available right now rather than a few months later when replaced workers learned from the documents.