undefined | Better HN

story

0 pointslostmsu5y ago0 comments

Where did you get this from? AFAIK GPT-3 (for example) was trained on a GPU cluster, not TPUs.

0 comments

Experience, for one. TPUs are dominating MLPerf benchmarks. That kind of performance can't be dismissed so easily.

GPT-2 was trained on TPUs. (There are explicit references to TPUs in the source code: https://github.com/openai/gpt-2/blob/0574c5708b094bfa0b0f6df...)

GPT-3 was trained on a GPU cluster probably because of Microsoft's billion-dollar Azure cloud credit investment, not because it was the best choice.

lostmsuOP5y ago

I checked MLPerf website, and it looks like A100 is outperforming TPUv3, and is also more capable (there does not seem to be a working implementation of RL for Go on TPU).

To be fair, TPUv4 is not out yet, and it might catch up using the latest processes (7nm TSMC or 8nm Samsung).

https://mlperf.org/training-results-0-7

option5y ago

no they are not. Go read recent MLPerf results more carefully and not Google’s blogpost. NVIDIA won 8/8 benchmarks for publicly available SW/HW combo. Also 8/8 on per chip performance. Google did show better results with some “research” system which is not available to anyone other then them yet.

sillysaurusx5y ago

This is a weirdly aggressive reply. I don't "read Google's blogpost," I use TPUs daily. As for MLPerf benchmarks, you can see for yourself here: https://mlperf.org/training-results-0-6 TPUs are far ahead of competitors. All of these training results are openly available, and you can run them yourself. (I did.)

For MLPerf 0.7, it's true that Google's software isn't available to the public yet. That's because they're in the middle of transitioning to Jax (and by extension, Pytorch). Once that transition is complete, and available to the public, you'll probably be learning TPU programming one way or another, since there's no other practical way to e.g. train a GAN on millions of photos.

You'd think people would be happy that there are realistic alternatives to nVidia's monopoly for AI training, rather than rushing to defend them...

p1esk5y ago

transitioning to Jax (and by extension, Pytorch)

Wait, what? Why would transition to Jax imply transition to Pytorch?

llukas5y ago

You are basing your opinion on last year MLPerf and some stuff that may or may not be available in the future. MLPerf 0.7 "available" category has been ghosted by google.

Pointing this out is not aggressive.

j / k navigate · click thread line to collapse

0 comments

sillysaurusx5y ago

Experience, for one. TPUs are dominating MLPerf benchmarks. That kind of performance can't be dismissed so easily.

GPT-2 was trained on TPUs. (There are explicit references to TPUs in the source code: https://github.com/openai/gpt-2/blob/0574c5708b094bfa0b0f6df...)

GPT-3 was trained on a GPU cluster probably because of Microsoft's billion-dollar Azure cloud credit investment, not because it was the best choice.

lostmsuOP5y ago

I checked MLPerf website, and it looks like A100 is outperforming TPUv3, and is also more capable (there does not seem to be a working implementation of RL for Go on TPU).

To be fair, TPUv4 is not out yet, and it might catch up using the latest processes (7nm TSMC or 8nm Samsung).

https://mlperf.org/training-results-0-7

option5y ago

sillysaurusx5y ago

You'd think people would be happy that there are realistic alternatives to nVidia's monopoly for AI training, rather than rushing to defend them...

p1esk5y ago

transitioning to Jax (and by extension, Pytorch)

Wait, what? Why would transition to Jax imply transition to Pytorch?

llukas5y ago

You are basing your opinion on last year MLPerf and some stuff that may or may not be available in the future. MLPerf 0.7 "available" category has been ghosted by google.

Pointing this out is not aggressive.

j / k navigate · click thread line to collapse