TPUs are available on GCP, but working with them has been frustrating.
My experience has been that it takes not insignificant effort to convert training scripts and then some weird unexpected bug takes a while to figure out, I’ve heard similar things from peers in academia.
Additionally, at least in early 2023, PyTorch had a substantial throughput reduction on TPUs so you’d probably need to use Jax (or god forbid TF) for efficiency sake.
Granted, I’ve heard some of the PyTorch XLA issues have since been improved.
Regardless, the H100 currently significantly outperforms TPUv4 in throughout on transformer loads, we’ll see what TPUv5 looks like to be fair but it’s not a given that Google/Amazon can outpace Nvidia in manufacturing and chip design when this is their core product and they also have amazing engineers + a large open source community building around CUDA.