CUDA Tile Open Sourced (opens in new tab)

(github.com)

201 pointsJonChesterfield6mo ago106 comments

106 comments

79 comments · 10 top-level

xmorse6mo ago· 20 in thread

Writing this in Mojo would have been so much easier

It's barely gaining adoption though. The lack of buzz is a chicken and egg issue for Mojo. I fiddled shortly with it (mainly to get it working some of my pythong scripts), and it was suprisingly easy. It'll shoot up one day for sure if Latner doesn't give up early on it.

ronsor6mo ago

Isn't the compiler still closed source? I and many other ML devs have no interest in a closed-source compiler. We have enough proprietary things from NVIDIA.

0x696C69616mo ago

Yeah, the mojo pitch is so good, but I don't think anyone has an appetite for the potential fuckery that comes with a closed source platform.

3abiton6mo ago

Yes, but Latner said multiple time it's closed until it matures (he apparently did this with llvm and swift too). So not unusal. His open source target is end of 2026. In all fairness, I have 0 doubts that he would deliver.

1 more reply

boredatoms6mo ago

I feel like its in AMD/Intel/G’s interest to pile a load of effort into (an open source) mojo

ipsum26mo ago

Mojo is not open source and would not get close to the performance of cuTile.

I'm tired of people shilling things they don't understand.

almostgotcaught6mo ago

it's all over this thread (and every single other hn thread about GPU/ML compilers) - people quoting random buzzword/clickbait takes.

bigyabai6mo ago

Use-cases like this are why Mojo isn't used in production, ever. What does Nvidia gain from switching to a proprietary frontend for a compiler backend they're already using? It's a legal headache.

Second-rate libraries like OpenCL had industry buy-in because they were open. They went through standards committees and cooperated with the rest of the industry (even Nvidia) to hear-out everyone's needs. Lattner gave up on appealing to that crowd the moment he told Khronos to pound sand. Nobody should be wondering why Apple or Nvidia won't touch Mojo with a thirty-nine and a half foot pole.

xmorse6mo ago

Kernels now written in Mojo were all in hand written in MLIR like in this repo. They made a full language because that's not scalable, a sane language is totally worth it. Nvidia will probably end up buying them in a few years.

pjmlp6mo ago

NVidia is perfectly fine with C++ and Python JIT.

CUDA Tile was exactly designed to give parity to Python in writing CUDA kernels, acknowledging the relevance of Python, while offering a path researchers don't need to mess with C++.

It was announced at this years GTC.

NVidia has no reason to use Mojo.

bigyabai6mo ago

I don't think Nvidia would acquire Mojo when the Triton compiler is open source, optimized for Nvidia hardware and considered a industry standard.

saagarjha6mo ago

Nobody is writing MLIR by hand, what are you on about? There are so many MLIR frontends

oedemis6mo ago

how mojo with max optimize the process?

itsthecourier6mo ago

what about a fourty feet pole? would it be viable?

llmslave26mo ago

I really want Mojo to take off. Maybe in a few years. The lack of an stdlib holds it back more than they think, and since their focus is narrow atm it's not useful for the vast majority of work.

pjmlp6mo ago

It would help if they were not so much macOS and Linux focused.

Julia, Python GPU JITs work great on Windows, and many people only get Windows systems as default at work.

saagarjha6mo ago

Approximately nobody writing high performance code for AI training is using Windows. Why should they target it?

pjmlp6mo ago

As desktop, and sometimes that is the only thing available.

When is the Year of NPUs on Linux?

1 more reply

bigyabai6mo ago

I've commissioned a board of MENSA members to devise a workaround for this issue; they've identified two potential solutions.

1) Install Linux

2) Summon Chris Lattner to play you a sad song on the world's smallest violin in honor of the Windows devs that refuse to install WSL.

pjmlp6mo ago

I go with customers keep using CUDA with Python and Julia, ignore Chris Latter's company exists, while Mojo repeats Swift for Tensorflow history.

What about that outcome?

CamperBob26mo ago· 17 in thread

Fun game: see how many clicks it takes you to learn what MLIR stands for.

I lost count at five or six. Define your acronyms on first use, people.

ipnon6mo ago

GPU programming definitely is not beginner friendly. There's a much higher learning curve than most open source projects. To learn basic Python you need to know about definitions and loops and variables, but to learn CUDA kernels you need to know maybe an order of magnitude more concepts to write anything useful. It's just not worth the time to cater to people who don't RTFM, the README would be twice as long and be redundant to the target audience of the library.

CamperBob26mo ago

That's the whole problem. I had to "R" multiple "FMs" before one of them bothered to define the acronym.

Stop carrying water for poor documentation practice.

ipnon6mo ago

It's kind of like if the Django README explained how SQL works, the structure of HTTP requests, best practices for HTML, and so on. If you don't know what MLIR is, you might not be the target audience for this library. Nvidia in general doesn't prioritize developer experience as much as companies like Meta do for open source projects like React.

1 more reply

__patchbit__6mo ago

Use the AI prompt to pinprick learn.

Just say to the AI, "Explain THIS".

2 more replies

saagarjha6mo ago

This is a GitHub repo for compiler engineers.

CamperBob26mo ago

Cool. This is a site for hackers of all stripes.

bigyabai6mo ago

I don't give "finance hackers" or "growth hackers" the time of day. Many hackers are held in utter contempt, and often for a very good reason.

2 more replies

saagarjha6mo ago

Yes, so given that you clearly had trouble figuring out what it was, maybe you could have shared with the class?

fragmede6mo ago

I did it in three. I selected it in your comment, and then had to hit "more" to get to the menu to ask Google about it, which brought me to https://www.google.com/search?q=MLIR which says: MLIR is an open-source compiler infrastructure project developed as a sub-project of the LLVM project. Hopefully

Get better at computers and stop needing to be spoon-fed information, people!

reactordev6mo ago

In this day and age, asking questions about what something is is a minefield of “just ask AI” and “You should know this”. Let’s stop putting down people who ask questions and root out those that have shitty answers.

ThrowawayTestr6mo ago

Google is nearly 30 years old

1 more reply

fragmede6mo ago

I get why it feels frustrating when someone snaps "just google it." Nobody likes feeling dumb. That said, there’s a meaningful difference between asking a genuine question and demanding that every discussion be padded to accommodate readers who won’t even type four letters into a search bar. Expecting complete spoon-feeding in technical threads isn’t curiosity; it’s a refusal to engage. Learning requires participation.

3 more replies

iaebsdfsh6mo ago

From Wikipedia: The name "Multi-Level Intermediate Representation" reflects the system’s ability to model computations at various abstraction levels and progressively lower them toward machine code.

poita666mo ago

And yet you didn’t tell us what it stands for, just what it is. The person you’re responding to was specifically talking about finding out what it stands for

rswail6mo ago

Based on the use of LLVM I guessed "Machine Learning Intermediate Representation"?

How close was I?

1 more reply

roughly6mo ago

The ol’ TMA problem.

piskov6mo ago

If only there was a chat-based app that you could ask questions to.

fooblaster6mo ago· 14 in thread

Let's see if developers sleepwalk into another trap to keep us locked into nvidia's hardware for the next decade.

pjmlp6mo ago

It is up to AMD, Intel and Khronos to offer APIs and tools that are actually nice to use.

They have had about 15 years to move beyond C99, stone age workflows to compile GLSL and C99 with their drivers, no libraries ecosystem, and printf debugging.

Eventually some of the issues have been fixed, after they started seeing only hardliners would put with such development experience, and then it was too late.

tester7566mo ago

Isn't there OneAPI with its huge ecosystem of tools, debuggers, etc?

pjmlp6mo ago

Yes, that is part of "it was too late".

OneAPI builds on top of SYSCL, is basically Intel's CUDA, which it is already the second attempt to have C++ in OpenCL, during OpenCL 2.x, an effort that worked so well, that OpenCL 3.0 is basically a reboot back to OpenCL 1.0.

Also even SYSCL only got a proper kick-off after CodePlay came up with its implementation, nowadays they sell oneAPI support and tooling, after being acquired by Intel.

the__alchemist6mo ago

IMO it's not Nvidia's fault the competing APIs are high friction.

flyingcoder6mo ago

AMD screwed up so badly.

fooblaster6mo ago

That is true, but that doesn't mean Nvidia is not engaging in engineering to intentionally kneecap competition. Triton and other languages like that are a huge threat and CUtile is a means to combat that threat and prevent a hardware abstraction layer.

positron266mo ago

Hundreds of thousands of developers with access to a global communication network were not stopped by AMD. Why act like dependents or wait for some bright star of consensus unless the intent is really about getting the work for free?

We don't have to wait for singular companies or foundations to fix ecosystem problems. Only the means of coordination are needed. https://prizeforge.com isn't there yet, but it is already capable of bootstrapping its own development. Matching funds, joining the team, or contributing on MuTate will all make the ball pick up speed faster.

1 more reply

OneDeuxTriSeiGo6mo ago

CUDA Tile is an open source MLIR Dialect so it wouldn't take much to write MLIR transforms to map it from the Tile IR to TOSA or gpu + vector + some amdgpu or other specialty dialects.

The Tile dialect is pretty much independent of the nvidia ecosystem so all it takes is one good set of MLIR transform passes to run anything on the CUDA stack that compiles to tile out of the nvidia ecosystem prison.

So if anything this is actually a massive opportunity to escape vendor lock in if it catches on in the CUDA ecosystem.

saagarjha6mo ago

Yes, but why would you want to use this over the other MLIR dialects that are already cross platform?

OneDeuxTriSeiGo6mo ago

That's not really the point. The point is that Nvidia is updating a lot of their higher level CUDA tooling to integrate with and compile to Tile IR. So this gives an escape hatch for tools built on top of CUDA to deploy outside the ecosystem.

RobotToaster6mo ago

Or it's Nvidia doing an Embrace Extend Extinguish on MLIR.

trueismywork6mo ago

TileIR license means llvm can just fork and support it themselves as needed.

trueismywork6mo ago

TileIR is Apache licensed so AMD can implement it as well.

RicoElectrico6mo ago

Obviously they will, as with the mainframe and cloud.

jauntywundrkind6mo ago· 14 in thread

Will be interesting to see if Nvidia and other have any interest & energy getting this used by others, if there actually is an ecosystem forming around it.

Google leading XLA & IREE, with awesome intermediate representations, used by lots of hardware platforms, and backing really excellent Jax & Pytorch implementations, having tools for layout & optinization folks can share: they really build an amazing community.

There's still so much room for planning/scheduling, so much hardware we have yet to target. RISC-V has really interesting vector instructions, for example, and it seems like there's so much exploration / work to do to better leverage that.

Nvidia has partners everywhere now. Nvlink is used by Intel, AWS Tritanium, others. Yesterday the Groq exclusive license that Nvidia paid to give to Groq?! Seeing how and when CUDA Tiles emerges: will be interesting. Moving from fabric partnerships, up up up the stack.

pjmlp6mo ago

For NVidia it suffices this is a Python JIT allowing programming CUDA compute kernels directly in Python instead of C++, yet another way how Intel and AMD, alongside Khronos APIs, lag behind in great developer experiences for GPU compute programming.

Ah, and Nsight debugging also supports Python CUDA Tiles debugging.

https://developer.nvidia.com/blog/simplify-gpu-programming-w...

Q6T46nT668w6i3m6mo ago

Slang is a fantastic developer experience.

Conscat6mo ago

I work at Nvidia, and my team is using Slang for all of our (numerous and non-trivial) kernels because its automatic differentiation type system is so nice.

pjmlp6mo ago

Especially when using the tooling from who created it, before offering it to Khronos as GLSL replacement, NVIDIA.

saagarjha6mo ago

Nsight does not have a debugger.

dahart6mo ago

What do you mean? Are you unaware of Nsight VSE? https://developer.nvidia.com/nsight-visual-studio-edition

1 more reply

pjmlp6mo ago

Yes it does, apparently you never used it.

Moosdijk6mo ago

> There's still so much room for planning/scheduling, so much hardware we have yet to target

this is nicely illustrated by this recent article:

https://news.ycombinator.com/item?id=46366998

saagarjha6mo ago

Wrong type of scheduling.

Moosdijk5mo ago

Thanks for correcting me. Can you point me to what I need to search for to understand the differences?

1 more reply

turtletontine6mo ago

On the RISC-V vector instructions, could you elaborate? Are the vector extensions substantially different from those in ARM or x86?

adgjlsfhk16mo ago

it's fairly similar to Arm's sve2, but very different from the x86 side in that the instructions are variable length rather than fixed

nl6mo ago

> Groq exclusive license

non-exclusive license actually.

almostgotcaught6mo ago

> Google leading XLA & IREE

IREE hasn't been at G for >2 years.

0-_-06mo ago· 2 in thread

This is basically the nvidia equivalent of cooperative_matrix_2 in Vulkan which is vendor agnostic and should get much more hype that it's getting.

pjmlp5mo ago

Maybe Vulkan could provide native support for Python, C++20, and a graphical debugging experience.

It is surely not equivalent as of today.

0-_-05mo ago

Or even just pointers...

gaogao6mo ago· 1 in thread

The compiler for CUDA Tile being Blackwell only is a baffling decision. I wanted to try it out, but it's only really easy to grab H100s quickly right now. I guess maybe I'll try it out on my 5070 Ti after traveling, but am more likely to stick to an IR that targets multiple platforms, since they couldn't be bothered.

robobsolete6mo ago

I was keen to try it too, but oh well

boywitharupee6mo ago· 1 in thread

shouldn't the title be "CUDA Tile IR Open Sourced"?

OneDeuxTriSeiGo6mo ago

It's more or less the same thing. CUDA TIle is the name of the IR, cuTile is the name of the high level DSLs.

opan6mo ago

>The CUDA Tile IR project is under the Apache License v2.0 with LLVM Exceptions

pyuser5836mo ago

I’m glad CUDA and “open source” are in the same sentence again.

We’d all prefer cross platform programming, but if you’re going to do platform specific, I prefer open source to closed source.

Thank you NVIDIA!

toolboxg1x06mo ago

NVIDIA tensor core units, where the second column in kernel optimization is producing a test suite.

j / k navigate · click thread line to collapse

106 comments

79 comments · 10 top-level

xmorse6mo ago· 20 in thread

Writing this in Mojo would have been so much easier

3abiton6mo ago

ronsor6mo ago

Isn't the compiler still closed source? I and many other ML devs have no interest in a closed-source compiler. We have enough proprietary things from NVIDIA.

0x696C69616mo ago

Yeah, the mojo pitch is so good, but I don't think anyone has an appetite for the potential fuckery that comes with a closed source platform.

3abiton6mo ago

1 more reply

boredatoms6mo ago

I feel like its in AMD/Intel/G’s interest to pile a load of effort into (an open source) mojo

ipsum26mo ago

Mojo is not open source and would not get close to the performance of cuTile.

I'm tired of people shilling things they don't understand.

almostgotcaught6mo ago

it's all over this thread (and every single other hn thread about GPU/ML compilers) - people quoting random buzzword/clickbait takes.

bigyabai6mo ago

Use-cases like this are why Mojo isn't used in production, ever. What does Nvidia gain from switching to a proprietary frontend for a compiler backend they're already using? It's a legal headache.

xmorse6mo ago

pjmlp6mo ago

NVidia is perfectly fine with C++ and Python JIT.

CUDA Tile was exactly designed to give parity to Python in writing CUDA kernels, acknowledging the relevance of Python, while offering a path researchers don't need to mess with C++.

It was announced at this years GTC.

NVidia has no reason to use Mojo.

bigyabai6mo ago

I don't think Nvidia would acquire Mojo when the Triton compiler is open source, optimized for Nvidia hardware and considered a industry standard.

saagarjha6mo ago

Nobody is writing MLIR by hand, what are you on about? There are so many MLIR frontends

oedemis6mo ago

how mojo with max optimize the process?

itsthecourier6mo ago

what about a fourty feet pole? would it be viable?

llmslave26mo ago

I really want Mojo to take off. Maybe in a few years. The lack of an stdlib holds it back more than they think, and since their focus is narrow atm it's not useful for the vast majority of work.

pjmlp6mo ago

It would help if they were not so much macOS and Linux focused.

Julia, Python GPU JITs work great on Windows, and many people only get Windows systems as default at work.

saagarjha6mo ago

Approximately nobody writing high performance code for AI training is using Windows. Why should they target it?

pjmlp6mo ago

As desktop, and sometimes that is the only thing available.

When is the Year of NPUs on Linux?

1 more reply

bigyabai6mo ago

I've commissioned a board of MENSA members to devise a workaround for this issue; they've identified two potential solutions.

1) Install Linux

2) Summon Chris Lattner to play you a sad song on the world's smallest violin in honor of the Windows devs that refuse to install WSL.

pjmlp6mo ago

I go with customers keep using CUDA with Python and Julia, ignore Chris Latter's company exists, while Mojo repeats Swift for Tensorflow history.

What about that outcome?

CamperBob26mo ago· 17 in thread

Fun game: see how many clicks it takes you to learn what MLIR stands for.

I lost count at five or six. Define your acronyms on first use, people.

ipnon6mo ago

CamperBob26mo ago

That's the whole problem. I had to "R" multiple "FMs" before one of them bothered to define the acronym.

Stop carrying water for poor documentation practice.

ipnon6mo ago

1 more reply

__patchbit__6mo ago

Use the AI prompt to pinprick learn.

Just say to the AI, "Explain THIS".

2 more replies

saagarjha6mo ago

This is a GitHub repo for compiler engineers.

CamperBob26mo ago

Cool. This is a site for hackers of all stripes.

bigyabai6mo ago

I don't give "finance hackers" or "growth hackers" the time of day. Many hackers are held in utter contempt, and often for a very good reason.

2 more replies

saagarjha6mo ago

Yes, so given that you clearly had trouble figuring out what it was, maybe you could have shared with the class?

fragmede6mo ago

Get better at computers and stop needing to be spoon-fed information, people!

reactordev6mo ago

ThrowawayTestr6mo ago

Google is nearly 30 years old

1 more reply

fragmede6mo ago

3 more replies

iaebsdfsh6mo ago

poita666mo ago

And yet you didn’t tell us what it stands for, just what it is. The person you’re responding to was specifically talking about finding out what it stands for

rswail6mo ago

Based on the use of LLVM I guessed "Machine Learning Intermediate Representation"?

How close was I?

1 more reply

roughly6mo ago

The ol’ TMA problem.

piskov6mo ago

If only there was a chat-based app that you could ask questions to.

fooblaster6mo ago· 14 in thread

Let's see if developers sleepwalk into another trap to keep us locked into nvidia's hardware for the next decade.

pjmlp6mo ago

It is up to AMD, Intel and Khronos to offer APIs and tools that are actually nice to use.

They have had about 15 years to move beyond C99, stone age workflows to compile GLSL and C99 with their drivers, no libraries ecosystem, and printf debugging.

Eventually some of the issues have been fixed, after they started seeing only hardliners would put with such development experience, and then it was too late.

tester7566mo ago

Isn't there OneAPI with its huge ecosystem of tools, debuggers, etc?

pjmlp6mo ago

Yes, that is part of "it was too late".

Also even SYSCL only got a proper kick-off after CodePlay came up with its implementation, nowadays they sell oneAPI support and tooling, after being acquired by Intel.

the__alchemist6mo ago

IMO it's not Nvidia's fault the competing APIs are high friction.

flyingcoder6mo ago

AMD screwed up so badly.

fooblaster6mo ago

positron266mo ago

1 more reply

OneDeuxTriSeiGo6mo ago

CUDA Tile is an open source MLIR Dialect so it wouldn't take much to write MLIR transforms to map it from the Tile IR to TOSA or gpu + vector + some amdgpu or other specialty dialects.

So if anything this is actually a massive opportunity to escape vendor lock in if it catches on in the CUDA ecosystem.

saagarjha6mo ago

Yes, but why would you want to use this over the other MLIR dialects that are already cross platform?

OneDeuxTriSeiGo6mo ago

RobotToaster6mo ago

Or it's Nvidia doing an Embrace Extend Extinguish on MLIR.

trueismywork6mo ago

TileIR license means llvm can just fork and support it themselves as needed.

trueismywork6mo ago

TileIR is Apache licensed so AMD can implement it as well.

RicoElectrico6mo ago

Obviously they will, as with the mainframe and cloud.

jauntywundrkind6mo ago· 14 in thread

Will be interesting to see if Nvidia and other have any interest & energy getting this used by others, if there actually is an ecosystem forming around it.

pjmlp6mo ago

Ah, and Nsight debugging also supports Python CUDA Tiles debugging.

https://developer.nvidia.com/blog/simplify-gpu-programming-w...

Q6T46nT668w6i3m6mo ago

Slang is a fantastic developer experience.

Conscat6mo ago

I work at Nvidia, and my team is using Slang for all of our (numerous and non-trivial) kernels because its automatic differentiation type system is so nice.

pjmlp6mo ago

Especially when using the tooling from who created it, before offering it to Khronos as GLSL replacement, NVIDIA.

saagarjha6mo ago

Nsight does not have a debugger.

dahart6mo ago

What do you mean? Are you unaware of Nsight VSE? https://developer.nvidia.com/nsight-visual-studio-edition

1 more reply

pjmlp6mo ago

Yes it does, apparently you never used it.

Moosdijk6mo ago

> There's still so much room for planning/scheduling, so much hardware we have yet to target

this is nicely illustrated by this recent article:

https://news.ycombinator.com/item?id=46366998

saagarjha6mo ago

Wrong type of scheduling.

Moosdijk5mo ago

Thanks for correcting me. Can you point me to what I need to search for to understand the differences?

1 more reply

turtletontine6mo ago

On the RISC-V vector instructions, could you elaborate? Are the vector extensions substantially different from those in ARM or x86?

adgjlsfhk16mo ago

it's fairly similar to Arm's sve2, but very different from the x86 side in that the instructions are variable length rather than fixed

nl6mo ago

> Groq exclusive license

non-exclusive license actually.

almostgotcaught6mo ago

> Google leading XLA & IREE

IREE hasn't been at G for >2 years.

0-_-06mo ago· 2 in thread

This is basically the nvidia equivalent of cooperative_matrix_2 in Vulkan which is vendor agnostic and should get much more hype that it's getting.

pjmlp5mo ago

Maybe Vulkan could provide native support for Python, C++20, and a graphical debugging experience.

It is surely not equivalent as of today.

0-_-05mo ago

Or even just pointers...

gaogao6mo ago· 1 in thread

robobsolete6mo ago

I was keen to try it too, but oh well

boywitharupee6mo ago· 1 in thread

shouldn't the title be "CUDA Tile IR Open Sourced"?

OneDeuxTriSeiGo6mo ago

It's more or less the same thing. CUDA TIle is the name of the IR, cuTile is the name of the high level DSLs.

opan6mo ago

>The CUDA Tile IR project is under the Apache License v2.0 with LLVM Exceptions

pyuser5836mo ago

I’m glad CUDA and “open source” are in the same sentence again.

We’d all prefer cross platform programming, but if you’re going to do platform specific, I prefer open source to closed source.

Thank you NVIDIA!

toolboxg1x06mo ago

NVIDIA tensor core units, where the second column in kernel optimization is producing a test suite.

j / k navigate · click thread line to collapse