For whatever reason, people just delete these tools from their minds, then claim Nvidia still has a monopoly on CUDA.
And yet still the popcorn gallery says "there no [realistic] alternative to CUDA." Methinks the real issue is that CUDA is the best software solution for Nvidia GPUs, and the alternative hardware vendors aren't seen as viable competitor for hardware reasons, and people attribute the failure to software failures.
Is there?
10 years ago, I burned about 6 months of project time slogging through AMD / OpenCL bugs before realizing that I was being an absolute idiot and that the green tax was far cheaper than the time I was wasting. If you asked AMD, they would tell you that OpenCL was ready for new applications and support was right around the corner for old applications. This was incorrect on both counts. Disastrously so, if you trusted them. I learned not to trust them. Over the years, they kept making the same false promises and failing to deliver, year after year, generation after generation of grad students and HPC experts, filling the industry with once-burned-twice-shy received wisdom.
When NVDA pumped and AMD didn't, presumably AMD could no longer deny the inadequacy of their offerings and launched an effort to fix their shit. Eventually I am sure it will bear fruit. But is their shit actually fixed? Keeping in mind that they have proven time and time and time and time again that they cannot be trusted to answer this question themselves?
80% margins won't last forever, but the trust deficit that needs to be crossed first shouldn't be understated.
It certainly seems like there's a "nobody ever got fired for buying nvidia" dynamic going on. We've seen this mentality repeatedly in other areas of the industry: that's why the phrase is a snowclone.
Eventually, someone is going to use non-nvidia GPU accelerators and get a big enough cost or performance win that industry attitudes will change.
On paper, yes. But how many of them actually work? Every couple of years AMD puts out a press release saying they're getting serious this time and will fully support their thing, and then a couple of people try it and it doesn't work (or maybe the basic hello world test works, but anything else is too buggy), and they give up.
Intel finally seem to have got their act together a bit with OneAPI but they've languished for years in this area.
At least for Intel, that is just not true. Intel's DPC++ is as open as it gets. It implements a Khronos standard (SYCL), most of the development is happening in public on GitHub, it's permissively licensed, it has a viable backend infrastructure (with implementations for both CUDA and HIP). There's also now a UXL foundation with the goal of creating an "open standard accelerator software ecosystem".
Do you have experience with SYCL? My experience with OpenCL was that it's really a PITA to work with. The thing that CUDA makes nice is the direct and minimal exercise to start running GPGPU kernels. write the code, compile with nvcc, cudaed.
OpenCL had just a weird dance to perform to get a kernel running. Find the OpenCL device using a magic filesystem token. Ask the device politely if it wants to OpenCL. Send over the kernel string blob to compile. Run the kernel. A ton of ceremony and then you couldn't be guarenteed it'd work because the likes of AMD, Intel, or nVidia were all spotty on how well they'd support it.
SYCL seems promising but the ecosystem is a little intimidating. It does not seem (and I could be wrong here) that there is a defacto SYCL compiler. The goals of SYCL compilers are also fairly diverse.
No, I bought a Nvidia card and just use CUDA.
> OpenCL had just a weird dance to perform to get a kernel running...
Yeah but that entire list, if you step back and think big picture, probably isn't the problem. Programmers have a predictable response to that sort of silliness. Build a library over it & abstract it away. The sheer number of frameworks out there is awe-inspiring.
I gave up on OpenCL on AMD cards. It wasn't the long complex process that got me, it was the unavoidable crashes along the way. I suspect that is a more significant issue than I realised at the time (when I assumed it was just me) because it goes a long way to explain AMD's pariah-like status in the machine learning world. The situation is more one-sided than can be explained by just a well-optimised library. I've personally seen more success implementing machine learning frameworks on AMD CPUs than on AMD's GPUs, and that is a remarkable thing. Although I assume in 2024 the state of the game has changed a lot from when I was investigating the situation actively.
I don't think CUDA is the problem here, math libraries are commodity software that give a relatively marginal edge. The lack of CUDA is probably a symptom of deeper hardware problems once people stray off an explicitly graphical workflow. If the hardware worked to spec I expect someone would just build a non-optimised CUDA clone and we'd all move on. But AMD did build a CUDA clone and it didn't work for me at least - and the buzz suggests something is still going wrong for AMD's GPGPU efforts.