AMD did approximately nothing with ROCm.
Investing $10-20m of developer time into making ROCm work reliably easily would have paid for itself 100x.
I love when outsiders throw around random-ass takes like this. Just curious: how'd you come up with this number? Is it backed by literally any thought/data/roadmap?
Let's do some rough back of the envelope calculations: 20MM is 100 engineers working for 1 year. Or maybe it's 5 years of work for 20 engineers? Which one of those perspectives (if any!) sounds to you like a reasonable assessment of the gap between AMD and NVIDIA?
A quick reminder before you answer: whatever you think is actually involved in improving ROCm, unless you work on ROCm, you're almost certainly not considering an entire iceberg of complexity (runtime/driver/firmware).
Let's put it another way: forget AMD investing, I'll invest in you since you're so confident. I'll give you 20MM as a high-interest, non-dischargeable loan (say 8%) and all the runtime/driver/firmware source for AMDGPU. Up for it? All you have to do is improve ROCm such that it's competitive with CUDA and you can take home a huge slice of the TAM and you'll be rich. Easy right?
Cutting to the chase: you're off by at least two orders of magnitude on your goofy estimate; the real numbers are probably closer to 200MM invested every year for 10 years. And you still wouldn't be caught up because in those 10 years NVIDIA wasn't sitting on its laurels just waiting for you to catch up!
It's a multiple of what the TinyGrad ( https://tinygrad.org/#tinybox ) startup raised in capital. So $10-20m is absolutely reasonable, especially if you add an established HR with a hiring pipeline, established IT dept, offices, etc.
The multiplier is also easy to justify, given the stock price of NVidia and AMD.
> A quick reminder before you answer: whatever you think is actually involved in improving ROCm, unless you work on ROCm, you're almost certainly not considering an entire iceberg of complexity (runtime/driver/firmware).
Oh, I do. I've been following the OpenSource AMD driver development for the last 2 decades.
And I maintain that the total amount of investment that AMD needed to make to rival NVidia in the market cap, would have been around that number.
> Cutting to the chase: you're off by at least two orders of magnitude on your goofy estimate; the real numbers are probably closer to 200MM invested every year for 10 years.
For an entirely new company starting from scratch? Reasonable. But AMD is not a new company, and they already are doing most of the work needed.
As far as I know (and again, I work in the field of AI compilers), we're still a ways off from complete end-to-end generation of highly optimized kernels. If you want it to go fast, you need to write it by hand [1], and then test and validate.
Moreover, chip makers are constantly adding new features (Tensor Cores in NVIDIA for example), so the compiler is always playing catch up and at some point an engineer has to sit down (likely a team of them) and think 'what's the best way to exploit this hardware functionality for software performance?'. Then they have to test and validate that, and then either write a kernel, or attempt to put that know-how into a compiler.
Multiply this times the number of kernels in a typical suite, and... yeah.
And that was my point about herculean effort on modern chips. Assembly language isn't just the old 'Add register 1 and 2 and dump in R3' anymore. It's 'Use this instruction to access memory in this way, so that it's in a compatible format for the next instruction' and 'oh yeah, make sure your memory synchronization primitives are such that the whole thing is coherent'. Good luck!
Even going one step up into a higher-level language, you have to know how the kernel gets compiled to make it worthwhile. Again, it is trivial to write a correct opencl matrix multiply, but that's never going to be the highest performance. You have to know the hardware intimately. This is where having the software co-designed with hardware is very important. Basically, every AI chipmaker of any importance does this, including the startups, like Groq and Cerebras.
[1] A lot of kernels share basic patterns, so its not as hard as it sounds, but definitely requires engineering effort to get the design right.
I see it less as an engineering problem and more as a market problem. AMD stuff has existed, it’s the market that doesn’t see a point in it, and at this point, even feature parity or CUDA compatibility for that matter won’t make a huge dent. People will just keep using what they know and are recommended.
It’s more amazing to me that NVDA is so intensely inflated by this LLM hype wave. I find it genuinely scary to think about what’s going to happen when 95+% of AI slopware startups fold. Nvidia won’t be the only company financially impacted. Our entire economy runs on fads.