I feel like your comment answers itself: If you have the money to be running a datacenter of thousands of A100 GPUs (or equivalent), the cost of the electricity is negligible to you, and definitely worth training a SOTA model with your spare compute.
Is it really spare compute? Is the demand from others so low that these systems are truly idle? Does this also artificially make it look like demand is high because internal tasks are using it?