undefined | Better HN

0 pointsSilverBirch3y ago0 comments

Yes, this is exactly the point - some people are making arguments about Nvidia's amazing software stack. Sorry, but Amazon and Google have some pretty fucking good software engineers, Google already tapes out TPUs. Oh, and those are Nvidia's biggest potential customers.

Honestly this stock market move makes it look like Google should just announce "hey we're putting the TPU on sale for general availability" they'd jump 25% overnight.

0 comments

2 comments · 2 top-level

haldujai3y ago

TPUs are available on GCP, but working with them has been frustrating.

My experience has been that it takes not insignificant effort to convert training scripts and then some weird unexpected bug takes a while to figure out, I’ve heard similar things from peers in academia.

Additionally, at least in early 2023, PyTorch had a substantial throughput reduction on TPUs so you’d probably need to use Jax (or god forbid TF) for efficiency sake.

Granted, I’ve heard some of the PyTorch XLA issues have since been improved.

Regardless, the H100 currently significantly outperforms TPUv4 in throughout on transformer loads, we’ll see what TPUv5 looks like to be fair but it’s not a given that Google/Amazon can outpace Nvidia in manufacturing and chip design when this is their core product and they also have amazing engineers + a large open source community building around CUDA.

ethbr03y ago

I dunno if I'd trust Google hardware from a reliability and support standpoint.

j / k navigate · click thread line to collapse