story
You'll need to link me to some specific implementation that you want me to port over, not just namedrop some random algorithm. Got a link to a github?
If your point is "There isn't a preexisting operation for overlap-save FFT" then... yes, sure, that's true. There's also not a preexisting operation for any of the hundreds of other algorithms that you'd like to do with signal processing. But they can all be implemented efficiently.
Yet it remains a fact that TPU cannot do certain workloads without offloading to the CPU (making it orders of magnitude slower), and that's somehow okay?
I think this is the crux of the issue: you're saying X can't be done, I'm saying X can be done, so please link to a specific code example. Emphasis on "specific" and "code".