Well, there you go. For one TensorFlow is not a generic framework like cuda is, so you lose a whole bunch of the configurability you have with cuda. So, for example, even though there is an FFT raw function, there doesn't appear to be a way to do more complicated FFTs, such as an overlap-save. This is trivial to do on a GPU, and is built into the library. The raw functions it provides is not direct access to the hardware and memory subsystem. It's a set of raw functions that is a small subset of the total problem space. And certainly if you are saying that running something on a TPU's CPU cores are in any way going to compete with a gpu, then I don't know what to tell you.
You did not give an example of something GPUs can't do. all you said was that TPUs are faster for a specific function in your case.