I'm tired of people shilling things they don't understand.
Second-rate libraries like OpenCL had industry buy-in because they were open. They went through standards committees and cooperated with the rest of the industry (even Nvidia) to hear-out everyone's needs. Lattner gave up on appealing to that crowd the moment he told Khronos to pound sand. Nobody should be wondering why Apple or Nvidia won't touch Mojo with a thirty-nine and a half foot pole.
CUDA Tile was exactly designed to give parity to Python in writing CUDA kernels, acknowledging the relevance of Python, while offering a path researchers don't need to mess with C++.
It was announced at this years GTC.
NVidia has no reason to use Mojo.
Julia, Python GPU JITs work great on Windows, and many people only get Windows systems as default at work.
When is the Year of NPUs on Linux?
1) Install Linux
2) Summon Chris Lattner to play you a sad song on the world's smallest violin in honor of the Windows devs that refuse to install WSL.
What about that outcome?
I lost count at five or six. Define your acronyms on first use, people.
Stop carrying water for poor documentation practice.
Just say to the AI, "Explain THIS".
Get better at computers and stop needing to be spoon-fed information, people!
How close was I?
They have had about 15 years to move beyond C99, stone age workflows to compile GLSL and C99 with their drivers, no libraries ecosystem, and printf debugging.
Eventually some of the issues have been fixed, after they started seeing only hardliners would put with such development experience, and then it was too late.
OneAPI builds on top of SYSCL, is basically Intel's CUDA, which it is already the second attempt to have C++ in OpenCL, during OpenCL 2.x, an effort that worked so well, that OpenCL 3.0 is basically a reboot back to OpenCL 1.0.
Also even SYSCL only got a proper kick-off after CodePlay came up with its implementation, nowadays they sell oneAPI support and tooling, after being acquired by Intel.
We don't have to wait for singular companies or foundations to fix ecosystem problems. Only the means of coordination are needed. https://prizeforge.com isn't there yet, but it is already capable of bootstrapping its own development. Matching funds, joining the team, or contributing on MuTate will all make the ball pick up speed faster.
The Tile dialect is pretty much independent of the nvidia ecosystem so all it takes is one good set of MLIR transform passes to run anything on the CUDA stack that compiles to tile out of the nvidia ecosystem prison.
So if anything this is actually a massive opportunity to escape vendor lock in if it catches on in the CUDA ecosystem.
Google leading XLA & IREE, with awesome intermediate representations, used by lots of hardware platforms, and backing really excellent Jax & Pytorch implementations, having tools for layout & optinization folks can share: they really build an amazing community.
There's still so much room for planning/scheduling, so much hardware we have yet to target. RISC-V has really interesting vector instructions, for example, and it seems like there's so much exploration / work to do to better leverage that.
Nvidia has partners everywhere now. Nvlink is used by Intel, AWS Tritanium, others. Yesterday the Groq exclusive license that Nvidia paid to give to Groq?! Seeing how and when CUDA Tiles emerges: will be interesting. Moving from fabric partnerships, up up up the stack.
Ah, and Nsight debugging also supports Python CUDA Tiles debugging.
https://developer.nvidia.com/blog/simplify-gpu-programming-w...
this is nicely illustrated by this recent article:
non-exclusive license actually.
IREE hasn't been at G for >2 years.
We’d all prefer cross platform programming, but if you’re going to do platform specific, I prefer open source to closed source.
Thank you NVIDIA!