A new GPU backend for the TVM stack (opens in new tab)

(tvmlang.org)

70 pointsziheng8y ago28 comments

28 comments

15 comments · 3 top-level

muli8y ago· 8 in thread

Some quick calculations about TFLOPS per dollar on GEMM.

Since NVIDIA's volta consumer card is not out yet, I used Titan Xp as the reference card. I grabbed prices from wikipedia, and assume TVM reaches 64% peak perf on Vega and 90% peak perf on Titan Xp:

Radeon RX Vega 64: 12.6TFLOPS * 65% / $499 = 0.01638 TFLOPS/$ Pascal Titan Xp: 12TFLOPS * 90% / $1200 = 0.009 TFLOPS/$

So Vega outperforms a lot here.

jamilbk8y ago

Another important factor to consider is Vega supports double speed FP16 operations[1] and some ML frameworks are already beginning to optimize for that[2], so that's almost 24 TFLOPS of training compute for ~ $400 USD on the RX Vega 56.

[1]: https://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-... [2]: https://github.com/plaidml/plaidml/issues/29

dharma18y ago

Another thing to note is that you can in theory train with half precision without sacrificing much accuracy. There isn't that much support for it yet, for AMD anyway that I've seen, but Vega 64 has 2x speed half precision (25 TFLOPS) on paper.

Out of Nvidia cards only Tesla and Volta have 2x (or more) speed mixed precision ops (in Voltas case actually much faster because of the Tensor Cores) but they are in a very different price bracket.

dragontamer8y ago

I'd assume that Video RAM is important though. The Titan XP has 12GB of RAM, while the $500 Vega 64 only has 8GB of RAM.

A better comparison is probably Vega Frontier with 16GB of RAM for $1000. If you're doing heavy compute, you're probably gonna need a ton of RAM to go with it.

microcolonel8y ago

Vega has half-precision floats as well though, which (with a seemingly negligible loss in precision) in combination with their HBCC (transparent main memory DMA) should more than make up for the lack of memory on the Vega 64, and in the case of Frontier Edition, all the more so.

dharma18y ago

Vega is meant to have someting called HBCC - that effectively allows you to use system RAM or even SSD as GPU memory for VERY large datasets. I have no idea if it would be fast enough for deep learning workloads over PCIe

https://www.pcper.com/news/General-Tech/AMDs-HBCC-you-and-me

jamilbk8y ago

That's a fair assumption I think. It will be interesting to see whether Vega's high-bandwidth cache controller (HBCC) will help nullify this difference if implemented in ML frameworks.

TomV19718y ago

AMD positions the $1000 Vega FE against the $1200 Titan Xp.

The $700 1080 Ti has almost the same performance as the Titan Xp, so why not compare those?

It changes conclusions considerably...

dragontamer8y ago

Depends on the situation, which is why I think Vega Frontier Edition ($1000) is the apt comparison against the Titan XP ($1200).

The 1080 Ti has less RAM, and has NO support for 16-bit packed arithmetic. Ultimately, the 1080 TI is a graphics card designed to dominate video games, and NVidia cuts out other features that gamers don't care about.

With regards to the "Compute" sector, its Vega Frontier ($1000) vs Titan XP ($1200) at the low-end at least. NVidia Tesla chips ($4000 to $7000) constitute the higher-end.

1 more reply

0xbear8y ago· 4 in thread

How’s the perf per dollar? It’s not enough to “bring” it to AMD, it must be competitive as well.

dragontamer8y ago

Hmm, with the "Tensor Cores" of NVidia's next-generation Volta coming in, I'd bet that NVidia cards will be faster in machine learning tasks.

But then again: AMD Vega 56 / 64 have HBM2 and are under $1000. IIRC, the Vega Frontier Edition is $999 and 16GB of HBM2 at 480GB/s theoretical bandwidth.

NVidia also has an offering with high-speed HBM2 RAM: The Tesla P100, but its way more expensive: $7000 each.

I dunno if there are major benefits of HBM2 over GDDR5x however. Just listing off numbers here. The Titan Xp apparently has more bandwidth from the GDDR5x RAM for example, although the Titan XP is still more expensive than the Vega 64.

------------

If there is some problem that is global memory-bandwidth constrained, then it might be better to run it on AMD Vega 64. After all, you can pretty much afford 7x AMD Vega Frontier editions than the NVidia P100.

Obviously, this very much depends on your workload.

dharma18y ago

Yep, Nvidia is quoting 125 TFLOPs mixed precision on V100, boosted by Tensor Cores.

Vega 64 can in theory do 25 TFLOPs half precision.

But as you say there's a large price difference too.

For a market segment that needs 1-8 GPU rigs for ML on a low budget AMD could kill it if they invested in software support and kernel optimisation.

For servers and large scale training, unless AMD has some ML specialised cores in the pipeline, Nvidia Volta and Google TPUs have a serious lead.

1 more reply

make38y ago

the post says they've only focused on coverage so far, not perf. so it's not good yet, I expect.

"The current support on ROCm focuses on the functionality coverage. We have already seen promising performance results by simply adopting existing TVM schedules for CUDA backend. For example, you can try running the gemm test script in the TVM repository and see the result. For two types of cards we tested, the current gemm recipe for square matrix multiplication (not yet specifically optimized for AMD GPUs) already achieves 60% to 65% of peak performance. This is already a promising start, as it is very hard to optimize performance to get to peak and we did not yet apply AMD GPU specific optimizations. We are starting to look at performance optimization and we expect more improvement to come"

0xbear8y ago

One thing I don’t get is why AMD is not paying attention to the deep learning market. They could easily turn things around within a year with a team of 10 solid engineers, yet they seem to be choosing not to. I’m pretty sure the chips are capable of delivering the goods, it’s just a pain to get to the goods right now.

1 more reply

railgun2space8y ago

New usage for RX480/580 miner?

j / k navigate · click thread line to collapse

28 comments

15 comments · 3 top-level

muli8y ago· 8 in thread

Some quick calculations about TFLOPS per dollar on GEMM.

Since NVIDIA's volta consumer card is not out yet, I used Titan Xp as the reference card. I grabbed prices from wikipedia, and assume TVM reaches 64% peak perf on Vega and 90% peak perf on Titan Xp:

Radeon RX Vega 64: 12.6TFLOPS * 65% / $499 = 0.01638 TFLOPS/$ Pascal Titan Xp: 12TFLOPS * 90% / $1200 = 0.009 TFLOPS/$

So Vega outperforms a lot here.

jamilbk8y ago

[1]: https://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-... [2]: https://github.com/plaidml/plaidml/issues/29

dharma18y ago

Out of Nvidia cards only Tesla and Volta have 2x (or more) speed mixed precision ops (in Voltas case actually much faster because of the Tensor Cores) but they are in a very different price bracket.

dragontamer8y ago

I'd assume that Video RAM is important though. The Titan XP has 12GB of RAM, while the $500 Vega 64 only has 8GB of RAM.

A better comparison is probably Vega Frontier with 16GB of RAM for $1000. If you're doing heavy compute, you're probably gonna need a ton of RAM to go with it.

microcolonel8y ago

dharma18y ago

https://www.pcper.com/news/General-Tech/AMDs-HBCC-you-and-me

jamilbk8y ago

That's a fair assumption I think. It will be interesting to see whether Vega's high-bandwidth cache controller (HBCC) will help nullify this difference if implemented in ML frameworks.

TomV19718y ago

AMD positions the $1000 Vega FE against the $1200 Titan Xp.

The $700 1080 Ti has almost the same performance as the Titan Xp, so why not compare those?

It changes conclusions considerably...

dragontamer8y ago

Depends on the situation, which is why I think Vega Frontier Edition ($1000) is the apt comparison against the Titan XP ($1200).

With regards to the "Compute" sector, its Vega Frontier ($1000) vs Titan XP ($1200) at the low-end at least. NVidia Tesla chips ($4000 to $7000) constitute the higher-end.

1 more reply

0xbear8y ago· 4 in thread

How’s the perf per dollar? It’s not enough to “bring” it to AMD, it must be competitive as well.

dragontamer8y ago

Hmm, with the "Tensor Cores" of NVidia's next-generation Volta coming in, I'd bet that NVidia cards will be faster in machine learning tasks.

But then again: AMD Vega 56 / 64 have HBM2 and are under $1000. IIRC, the Vega Frontier Edition is $999 and 16GB of HBM2 at 480GB/s theoretical bandwidth.

NVidia also has an offering with high-speed HBM2 RAM: The Tesla P100, but its way more expensive: $7000 each.

------------

Obviously, this very much depends on your workload.

dharma18y ago

Yep, Nvidia is quoting 125 TFLOPs mixed precision on V100, boosted by Tensor Cores.

Vega 64 can in theory do 25 TFLOPs half precision.

But as you say there's a large price difference too.

For a market segment that needs 1-8 GPU rigs for ML on a low budget AMD could kill it if they invested in software support and kernel optimisation.

For servers and large scale training, unless AMD has some ML specialised cores in the pipeline, Nvidia Volta and Google TPUs have a serious lead.

1 more reply

make38y ago

the post says they've only focused on coverage so far, not perf. so it's not good yet, I expect.

0xbear8y ago

1 more reply

railgun2space8y ago

New usage for RX480/580 miner?

j / k navigate · click thread line to collapse