Since NVIDIA's volta consumer card is not out yet, I used Titan Xp as the reference card. I grabbed prices from wikipedia, and assume TVM reaches 64% peak perf on Vega and 90% peak perf on Titan Xp:
Radeon RX Vega 64: 12.6TFLOPS * 65% / $499 = 0.01638 TFLOPS/$ Pascal Titan Xp: 12TFLOPS * 90% / $1200 = 0.009 TFLOPS/$
So Vega outperforms a lot here.
[1]: https://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-... [2]: https://github.com/plaidml/plaidml/issues/29
Out of Nvidia cards only Tesla and Volta have 2x (or more) speed mixed precision ops (in Voltas case actually much faster because of the Tensor Cores) but they are in a very different price bracket.
A better comparison is probably Vega Frontier with 16GB of RAM for $1000. If you're doing heavy compute, you're probably gonna need a ton of RAM to go with it.
https://www.pcper.com/news/General-Tech/AMDs-HBCC-you-and-me
The $700 1080 Ti has almost the same performance as the Titan Xp, so why not compare those?
It changes conclusions considerably...
The 1080 Ti has less RAM, and has NO support for 16-bit packed arithmetic. Ultimately, the 1080 TI is a graphics card designed to dominate video games, and NVidia cuts out other features that gamers don't care about.
With regards to the "Compute" sector, its Vega Frontier ($1000) vs Titan XP ($1200) at the low-end at least. NVidia Tesla chips ($4000 to $7000) constitute the higher-end.
But then again: AMD Vega 56 / 64 have HBM2 and are under $1000. IIRC, the Vega Frontier Edition is $999 and 16GB of HBM2 at 480GB/s theoretical bandwidth.
NVidia also has an offering with high-speed HBM2 RAM: The Tesla P100, but its way more expensive: $7000 each.
I dunno if there are major benefits of HBM2 over GDDR5x however. Just listing off numbers here. The Titan Xp apparently has more bandwidth from the GDDR5x RAM for example, although the Titan XP is still more expensive than the Vega 64.
------------
If there is some problem that is global memory-bandwidth constrained, then it might be better to run it on AMD Vega 64. After all, you can pretty much afford 7x AMD Vega Frontier editions than the NVidia P100.
Obviously, this very much depends on your workload.
Vega 64 can in theory do 25 TFLOPs half precision.
But as you say there's a large price difference too.
For a market segment that needs 1-8 GPU rigs for ML on a low budget AMD could kill it if they invested in software support and kernel optimisation.
For servers and large scale training, unless AMD has some ML specialised cores in the pipeline, Nvidia Volta and Google TPUs have a serious lead.
"The current support on ROCm focuses on the functionality coverage. We have already seen promising performance results by simply adopting existing TVM schedules for CUDA backend. For example, you can try running the gemm test script in the TVM repository and see the result. For two types of cards we tested, the current gemm recipe for square matrix multiplication (not yet specifically optimized for AMD GPUs) already achieves 60% to 65% of peak performance. This is already a promising start, as it is very hard to optimize performance to get to peak and we did not yet apply AMD GPU specific optimizations. We are starting to look at performance optimization and we expect more improvement to come"