undefined | Better HN

0 pointsamkkma4y ago0 comments

Glad to hear it's being worked on!

> That said, I don't think we can be faulted for that one, because I don't think anybody really has a good answer to this particular design problem.

Agreed! To be clear, If there's any implication of "fault" it was certainly not in a moral sense or even anything around making poor design decisions. Julia's compiler is being asked to do many new things with semantics that necessarily predated many advances in PL.

Re Kernel fusion, there's another piece here, which you may or many not have included in "array-level optimizations". Julia's "just write loops" ethos is awesome, until you get to accelerators...now we're back to an "optimizer defined sub language" as TKF puts it. People like loops and flexibility, Dex, Floops.jl, Tullio, Loopvec and KA.jl show that it's possible to retain structure and emit accelerator-able loopy code. But none of those, except for dex, has a solution for fusing kernels that rely on loops. I'm still using the concept of Kernels, because there's still a bit of a separation between low level CUDA.jl code/these various DSLs and higher level array code, even if not as stark as python or C++.

Would be really cool, if like Dex, there's a plan to fuse these sorts of structured loops as well. Dex does it by having type level indexing and loop effects (they're actually moving to a user defined parallel effect handler system (https://arxiv.org/abs/2110.07493) ...the latter can tell the compiler when it's safe to parallelize and fuse+beta reduce loops. But that relies on structured semantics/effects and a higher level IR than exists in Julia.

Not sure what a Julian solution would look like, if possible. But given the usability wins, it would be great to have in Julia as well.

0 comments

1 comments · 1 top-level

celrod4y ago

> But none of those, except for dex, has a solution for fusing kernels that rely on loops.

The LV rewrite will. Some day, I'd like to have it target accelerators, but unlike fusion, I've not actually put any research/engineering into it so can't make any promises.

But my long term goal is that simple loops in -> optimized anything you want out. Enzyme also deserves a shout out for being able to generate reverse mode AD loops with mutation.

j / k navigate · click thread line to collapse