TorchTPU: Running PyTorch Natively on TPUs at Google Scale (opens in new tab)

(developers.googleblog.com)

206 pointsmji2mo ago18 comments

18 comments

18 comments · 8 top-level

isusmelj2mo ago· 5 in thread

Is it just me, or does it feel like everyone now uses AI to write any kind of blog?

These parts here somehow trigger me:

- Enter TorchTPU. As an engineering team, our mandate was to build a stack that leads with usability, portability, and excellent performance.

- Engineering the TorchTPU Stack: The Technical Reality

- Eager First: Flexibility Without Compromise

- The breakthrough, however, is our fused eager mode.

- The Road Ahead: 2026 and Beyond

I have mixed feelings about this. On one hand, we all seem to be using the same tools and converging to the same style. On the other hand, if we all use the same models with the same system prompts, we might lose a lot of creativity and diversity in online content.

pjmlp2mo ago

More like, everyone now has KPIs and OKRs attached to AI for their quartal appraisals.

If you don't, I envy you.

thevillagechief2mo ago

This right here is exactly right. If it hasn't come for you yet, it's coming.

PunchTornado2mo ago

it's just sad

amelius2mo ago

Yeah but everyone also uses AI to summarize blogs for them.

halJordan2mo ago

I think HN has moved from outright, unabashed llm hate, to a hyperactive cynicism so you can be the HN guy who caught out llm malfeasance. Give it three more years to cook. And I'll go a step further. This audience who claims to hate the unreal/false llm screed is the same set of people who caused tech news to go from Siracusa reviewing macos and Anand reviewing gpu's to car reviews on ars and anandtech being dead. If a tenth of the people who claim to hate shit llm articles were true, Ars and Anandtech wouldn't be dead

Reubend2mo ago· 2 in thread

Sounds good, but my main question is: is this a fork, or a new backend they're building in (like MPS)?

yu3zhou42mo ago

They write that they use PrivateUse1, so it’s a custom out-of-tree backend

musebox352mo ago

I attended the related session at Next’26 yesterday. From my understanding it is a new backend and they will release the torch tpu source on github in one or two months. It will not support all ops initially but they are moving fast. Still for a while torchax is mature enough to run torch models on tpus by translating to jax.

sergiopreira2mo ago· 1 in thread

So Google is basically admitting PyTorch/XLA on TPUs didn't work — TorchTPU looks like them rebuilding what should have worked day one. Its hard to run production ML on a toolchain engineers can't trust, no matter how fast the silicon is.

guipsp2mo ago

This does still look like it uses XLA, fwiw.

immanuwell2mo ago· 1 in thread

pitch basically boils down to 'just change one line and it works' which sounds too good to be true, but if they actually pull it off at 100k-chip scale, that's genuinely a big deal

xrikcus2mo ago

The pitch is "just change one line and it works". It is not "just change one line and you will get peak performance on the TPU".

From the text of the blog post: "Portability doesn't eliminate hardware realities, so TorchTPU facilitates a tiered workflow: establish correct execution first, then use our upcoming deep-dive guidelines to identify and refactor suboptimal architectures, or to inject custom kernels, for optimal hardware utilization."

MASNeo2mo ago· 1 in thread

Now all that’s missing is an actual chip that can be purchased. Any ideas?

sterlind2mo ago

I'm thinking of picking up some used Gaudis from eBay. they're pretty TPU-like. but other than oddball hardware like that it's just the GPU duopoly and proprietary bespoke stuff the hyperscalers have made for themselves.

shit, maybe China will start selling Huawei Ascend chips internationally.

yu3zhou42mo ago

Adding a support for new hardware to PyTorch is actually quite convenient. I did that with WebGPU using the same PrivateUse1 mechanism TorchTPU used. Every hardware has its own slot and identifier, and when you want to add a support for a new one without merging it into PyTorch, PrivateUse1 works essentially like plug-in slot

https://github.com/jmaczan/torch-webgpu

in-silico2mo ago

This is great to see.

I did trained some research models using the existing PyTorch/XLA on TPUs, and it was a mess of undocumented behavior and bugs (silently hanging after 8 hours of training!).

If anyone is trying to use PyTorch on TPU before TorchTPU is released, you can check out the training pipeline that I ended up building to support my research: https://github.com/aklein4/easy-torch-tpu

noracists2mo ago

Very excited for this.

j / k navigate · click thread line to collapse

18 comments

18 comments · 8 top-level

isusmelj2mo ago· 5 in thread

Is it just me, or does it feel like everyone now uses AI to write any kind of blog?

These parts here somehow trigger me:

- Enter TorchTPU. As an engineering team, our mandate was to build a stack that leads with usability, portability, and excellent performance.

- Engineering the TorchTPU Stack: The Technical Reality

- Eager First: Flexibility Without Compromise

- The breakthrough, however, is our fused eager mode.

- The Road Ahead: 2026 and Beyond

pjmlp2mo ago

More like, everyone now has KPIs and OKRs attached to AI for their quartal appraisals.

If you don't, I envy you.

thevillagechief2mo ago

This right here is exactly right. If it hasn't come for you yet, it's coming.

PunchTornado2mo ago

it's just sad

amelius2mo ago

Yeah but everyone also uses AI to summarize blogs for them.

halJordan2mo ago

Reubend2mo ago· 2 in thread

Sounds good, but my main question is: is this a fork, or a new backend they're building in (like MPS)?

yu3zhou42mo ago

They write that they use PrivateUse1, so it’s a custom out-of-tree backend

musebox352mo ago

sergiopreira2mo ago· 1 in thread

guipsp2mo ago

This does still look like it uses XLA, fwiw.

immanuwell2mo ago· 1 in thread

pitch basically boils down to 'just change one line and it works' which sounds too good to be true, but if they actually pull it off at 100k-chip scale, that's genuinely a big deal

xrikcus2mo ago

The pitch is "just change one line and it works". It is not "just change one line and you will get peak performance on the TPU".

MASNeo2mo ago· 1 in thread

Now all that’s missing is an actual chip that can be purchased. Any ideas?

sterlind2mo ago

shit, maybe China will start selling Huawei Ascend chips internationally.

yu3zhou42mo ago

https://github.com/jmaczan/torch-webgpu

in-silico2mo ago

This is great to see.

I did trained some research models using the existing PyTorch/XLA on TPUs, and it was a mess of undocumented behavior and bugs (silently hanging after 8 hours of training!).

noracists2mo ago

Very excited for this.

j / k navigate · click thread line to collapse