(For the benefit of non-Googlers/Xooglers: borg is a lower-level tool mostly used when everything else has gone wrong and borgcfg is a higher-level, more routine tool. These days people often layer things on top of that as well, because we love piling up abstraction layers. This approach is completely successful because abstraction layers never leak and solve every problem without making anything hard to debug at all. /s)
In my ideal world, even the lowest layer a human ever uses would do safety checks by default. Eg, imagine if the job specification included "query this safety check service on change" and the borg tool (as part of querying the existing job on a cancel/rm command) discovered that and honored it. Most people/jobs would use a safety check that fails taking down a job unless the load balancer reports all relevant services have that job drained. The safety check service could also specify a confirmation prompt (similar to what Rachel is advocating) that could be customizable (like qps or percent of global capacity rather than just number of tasks). The safety check would be effective no matter what layer you use, and there'd be no good reason to use one that would cause prompt fatigue. The outage rossjudson described (and I know he's not the only one who has done exactly this!) would have been avoided.