undefined | Better HN

0 pointsamkkma4y ago0 comments

>it has severe performance problems

It had. It's now around parity with pytorch.

And no, it wasn't about a usability tradeoff.

It was about being more general- More general compiler, more general code, more composable code.

Then, the team has been optimizing that and including compiler optimizations in the language that benefit all code. ML type code stressed that in a particular way. Pytorch does ML array heavy stuff as a special case.

Julia will be doing the same, but it's setting the groundwork for domain specific optimizations to be done in package and user space. A different sort of philosophy

It was about being more greedy and setting the groundwork for a more powerful tool in general, at some short term cost.

They could have just wrote a framework that baked in fp 32/64,16 with cuda kernels and tracing and operator overloading computational graphs and gotten more speedup over pytorch (in fact, avalon.jl takes that approach.), with better usability.

But they didn't and now there's a burgeoning ecosystem that does things no other framework can't. It's not quite as marginally beneficial for current vanilla ML because that is stuck in a local optimum, but I think that is going to change: https://www.stochasticlifestyle.com/useful-algorithms-that-a...

In the meantime, places like MIT, moderna, NASA etc are reaping the benefits.

0 comments

amkkmaOP4y ago

Some specific steps that will push it past jax/pytorch for chunky array heavy GPU code (can already beat or meet openblas/MKL for kernels written in scalar form).

1. Better compile time memory management (https://github.com/aviatesk/EscapeAnalysis.jl)

2. Linalg passes built on generic composable compiler ecosystem: https://youtu.be/IlFVwabDh6Q?t=818

3. Metatheory.jl egraph based symbolic optimization interleaved with the abstract interpreter: https://github.com/0x0f0f0f/Metatheory.jl

4. Partial eval mixed concrete and abstract interpretation

5. Compiler based autoparallel with dagger.jl

6. New compiler integrated AD (as a package) that isn't based on an accidental lispy compiler hack like zygote: https://github.com/JuliaDiff/Diffractor.jl

7. Changes to array semantics which will include generic immutability/ ownership concepts.

And many more. The key is that all the initial groundwork that traded off fundamental flexibility for specific speed will then feed back into making the ML usecase faster than if it had focused on that initially. People can do all kinds of crazy yet composable things, in pure Julia without modifying the base compiler.

Bonus: Being able to modify the type lattice to track custom program properties. This means that you don't need to be stuck into global tradeoffs with a static type system and can do things like opt in track array shapes at compile time per module: https://twitter.com/KenoFischer/status/1407810981338796035 Other packages like for quantum computing are planning to do their own analyses. It's generic and the usecases and compositions aren't frozen at the outset. (unlike for example, the swift tensors fitting perfectly proposal).

jsinai4y ago

> In the meantime, places like MIT, moderna, NASA etc are reaping the benefits.

Can you elaborate more? MIT is well known but would interesting to know how Moderna and NASA are using Flux?

amkkmaOP4y ago

Sure!

NASA: https://www.youtube.com/watch?v=tQpqsmwlfY0

Moderna: https://pumas.ai/ https://discourse.julialang.org/t/has-moderna-used-pumas-ai-...

There are many many more. These unique and sought after capability are what got Julia Computing its 24 mil series A (https://twitter.com/Viral_B_Shah/status/1417128416206376960)

snicker74y ago

> It had. It's now around parity with pytorch.

In some cases, it is much faster.

Consider Neural Stochastic Differential Equations, Flux is literally over 70,000x faster than Google's PyTorch-based implementation:

https://gist.github.com/ChrisRackauckas/6a03e7b151c86b32d74b...

dklend1224y ago

Yea, I meant parity for vanilla ML models. For anything off that beaten path it's much much faster

j / k navigate · click thread line to collapse

0 comments

amkkmaOP4y ago

Some specific steps that will push it past jax/pytorch for chunky array heavy GPU code (can already beat or meet openblas/MKL for kernels written in scalar form).

1. Better compile time memory management (https://github.com/aviatesk/EscapeAnalysis.jl)

2. Linalg passes built on generic composable compiler ecosystem: https://youtu.be/IlFVwabDh6Q?t=818

3. Metatheory.jl egraph based symbolic optimization interleaved with the abstract interpreter: https://github.com/0x0f0f0f/Metatheory.jl

4. Partial eval mixed concrete and abstract interpretation

5. Compiler based autoparallel with dagger.jl

6. New compiler integrated AD (as a package) that isn't based on an accidental lispy compiler hack like zygote: https://github.com/JuliaDiff/Diffractor.jl

7. Changes to array semantics which will include generic immutability/ ownership concepts.

jsinai4y ago

> In the meantime, places like MIT, moderna, NASA etc are reaping the benefits.

Can you elaborate more? MIT is well known but would interesting to know how Moderna and NASA are using Flux?

amkkmaOP4y ago

Sure!

NASA: https://www.youtube.com/watch?v=tQpqsmwlfY0

Moderna: https://pumas.ai/ https://discourse.julialang.org/t/has-moderna-used-pumas-ai-...

There are many many more. These unique and sought after capability are what got Julia Computing its 24 mil series A (https://twitter.com/Viral_B_Shah/status/1417128416206376960)

snicker74y ago

> It had. It's now around parity with pytorch.

In some cases, it is much faster.

Consider Neural Stochastic Differential Equations, Flux is literally over 70,000x faster than Google's PyTorch-based implementation:

https://gist.github.com/ChrisRackauckas/6a03e7b151c86b32d74b...

dklend1224y ago

Yea, I meant parity for vanilla ML models. For anything off that beaten path it's much much faster

j / k navigate · click thread line to collapse