I would love for you to expound, I found it interesting that you qualified your "should you bother, no" with "unless you are doing inference at scale". But in the previous paragraph you explained why you can get better performance with GPUs.
So is there some advantage of TPU, assuming there was SWE/API parity between GPUs?