For these massive, and expensive to train, AI models the differences hit harder since at the kernel level, where the pedal hits the metal, they are going to be wringing every last dollar of performance out of the chips by writing hand optimized kernels for them, highly customized to the chip's architecture and performance characteristics. It may go deeper than that too, with the detailed architecture of the models themselves tweaked to best perform on a specific chip.
So, bottom line is that you can't just take a model "compiled to run on TPUs", and train it on NVidia chips just because you have spare capacity there.