How Mojo gets a speedup over Python – Part 2 (opens in new tab)

(modular.com)

94 pointsCoreyFieldens2y ago64 comments

64 comments

46 comments · 17 top-level

leecarraher2y ago· 5 in thread

35Kx speedup is not scaled speedup. Throw this, naively parallelizable task at a bigger computer and get 70kx speedup, etc.

While i think there are tons of optimizations to be done for python (looking at you GIL) giving access to low level cpu primitives is not one I think that will be broadly adopted by the python community. That's one of the joys of python: system agnostic, looks pretty close to pseudocode, coding. If you want speed, glue together a bunch of compiled code calls, and hope the call overhead isn't too large. Or write cpu intensive operations in numba, or pyrex. At the end of the day, mojo's pay to play programming language harkens back to the early 90's Borland days.

ElectronCharge2y ago

> 35Kx speed up is not scaled speed up.

Right. However, this is a comparison versus Python and the GIL, which can’t do that at all.

> While i think there are tons of optimizations to be done for python (looking at you GIL) giving access to low level cpu primitives is not one I think that will be broadly adopted by the python community.

It doesn’t need to be, any more than writing Numba or Pyrex is done on a large scale.

> That's one of the joys of python: system agnostic, looks pretty close to pseudocode, coding. If you want speed, glue together a bunch of compiled code calls, and hope the call overhead isn't too large. Or write cpu intensive operations in numba, or pyrex. At the end of the day, mojo's pay to play programming language harkens back to the early 90's Borland days.

The appeal is having a high level language that compiles to efficient machine (and GPU!) code. One can “drop down” to Python for non performance intensive parts.

I think this will be much more of a draw for people coming from C++, Fortran and other older, jankier languages. It looks to hit a sweet spot for real time embedded development VERY well, especially given Rust-like memory safety!

Mojo will also be a worthy competitor to Julia in the HPC scientific arena I think…we’ll see!

catgary2y ago

Have you played with Mojo? It really doesn’t feel high level.

I feel like JAX has been eating Julia’s lunch lately, making me think that there’s a real market for a small functional differentiable programming language with good Python interop - like a more polished Dex or Futhark.

1 more reply

leecarraher2y ago

> Right. However, this is a comparison versus Python and the GIL, which can’t do that at all.

Single process python does not take advantage of a multicore architecture but neither would single process mojo. Embarrassingly parallel operations like mandlebrot can trivially be written with multiprocessing (https://github.com/DipanshuSehjal/Mandelbrot-set/blob/master...), or joblib to run in parallel in otherwise vanilla python. It would be trivial to implement this in jax and run on a gpu or tpu, but i wouldn't say that jax is the reason for the speed up.

1 more reply

ElectronCharge2y ago

> At the end of the day, mojo's pay to play programming language harkens back to the early 90's Borland days.

I didn’t address this in my other post. Modular is about to release a freely available SDK. Also, the standard library sources will be open sourced shortly. There are hints of additional open source initiatives.

Modular’s main business plan appears to be adding value in the general area of AI, AI training, and AI deployment, including by offering SAAS. That plan in no way conflicts with (and in fact encourages) an open Mojo language ecosystem.

leecarraher2y ago

that is good to hear. I read a post on Mojo months ago, signed up to the waitlist and then crickets. It would seem insane to think a non-open source, non-free compiler/interpreter could be successful these days.

dandiep2y ago· 5 in thread

I don't understand the play here for Modular. If this is a worthwhile improvement that is broadly applicable, won't it at some point make it's way into Python, numpy, etc?

In Java land we had a bunch of other JVMs over the years offering better performance. Most important things got absorbed into what is now OpenJDK, and the other JVMs, if they even exist at all, are niche players.

Performance is a huge focus in Python and ML lands right now, so why would this be any different?

sebzim45002y ago

They aren't just speeding up existing python code, they are making a superset of python which has additional performance features.

I guess it's possible that these features will be introduced into cpython etc. but I doubt it.

jlokier2y ago

Just based on their website, I think selling Mojo as a faster Python-like language isn't intended to be their main product. They place a lot more emphasis on AI/ML acceleration than on Mojo, and on creating compatibility between different AI hardware acceleration systems.

I have the impression they hope vendors of AI acceleration hardware, clusters and cloud services will be their customers, to provide uniform and heavily backward-compatible cross-acclerator AI/ML APIs to those vendors' customers.

And hope that users of those services and hardware will also pay for high quality well-researched APIs that work reliably with many different AI/ML accelerators, even if Mojo is free. Similar to how RedHat provides value through commercial-grade QA and sustained development for Linux on high-end hardware, that would be complicated and risky to use otherwise.

gojomo2y ago

If they've figured out how to deliver performance that Python might get around to in 5-10y, shouldn't they tout that, for people who might want that now?

Ultimately promoting the possibility for better performance, & current contrast, is good for prodding other languages/runtimes like Python to match these options. The "important things [get] absorbed" process you mention relies on teams making some "play for" alternatives, to create the impetus to get new things integrated.

dandiep2y ago

Totally, just trying to understand why this is a $100MM of VC money investment. Is the market that big for this? (Honest question)

3 more replies

pjmlp2y ago

Plenty of OpenJDK alternatives still exist, just like there are several C and C++ compilers.

Kelteseth2y ago· 5 in thread

So TL;DR: Using SIMD and multithreading is faster than doing no optimization in python. The only real comparison here is when not doing any optimization is:

> The above code produced a 90x speedup over Python and a 15x speedup over NumPy as shown in the figure below:

Am I missing something?

melling2y ago

Getting >10x speed up isn’t exciting enough for many people?

I’ll take it.

This is all pretty impressive if I can take my unmodified (slightly modified?) Python code and get that sort of improvement.

mathisfun1232y ago

> This is all pretty impressive if I can take my unmodified (slightly modified?) Python code and get that sort of improvement.

it'll never work as smoothly as they advertise. just hands down, beyond a shadow of a doubt, their claims about supporting "unmodified" Python code are startup hype. how do i know? i could give you a bunch of technical reasons about Python as a language and CPython as the de facto implementation (thereby informing tons of code already written, re extensions) but there's a much simpler way to reason about it: because there are already >10 attempts at this and no one has been able to do it. there's no magic here that any number of dollars or brains could pull off. instead each such project picks a point on the pythonic<->performant design-space tradeoff curve and then asks/expects you to live with that choice.

and taking ^ into consideration, mojo is not that special. only thing going for it is chris lattner isn't bad at designing languages so maybe, on its own, it'll be a nice language (but it needs to be open to get any traction on its own).

2 more replies

brucethemoose22y ago

> no optimization in python

Well, isn't that most Python? If Mojo can pave over the slow interpreted bits I repeatedly dig up in Python profilers, even well maintained projects, with no code changes, that would be huge.

anonymouse0082y ago

So does this mean Swift and Metal offers the same if not better performance enhancements? SIMD is very much a first class citizen as a type there

ElectronCharge2y ago

No, Lattner learned from Swift and is avoiding anything except zero-cost abstractions.

Also, Swift isn’t very interesting outside the Apple ecosystem, and Metal doesn’t exist outside the Apple ecosystem. Mojo has a real shot at widespread, general-purpose, language adoption!

CoreyFieldensOP2y ago· 3 in thread

I'm really interested in Mojo not for its AI applications, but as an alternative to Julia for high performance computing. Like Julia, Mojo is also attempting to solve the two-language problem, but I like that Mojo is coming at it from a Python perspective rather than trying to create new syntax. For better or for worse, Python is absolutely dominating in the field of scientific computing, and I don't see that changing anytime soon. Being able to write optimizations at a lower level in a Python-like syntax is really appealing to me.

Furthermore, while I love Julia the language, I'm disappointed in how it really hasn't taken off in adoption by either academia or industry. The community is small and that becomes a real pain point when it comes to tooling. Using the debugger is an awful experience and the VSCode extension that is recommended way to write Julia is very hit-or-miss. I think it would really benefit from a lot more funding that doesn't actually seem to be coming. It's not a 1-to-1 comparison, but Modular has received 3 times the amount of funding as JuliaHub despite being much younger.

pjmlp2y ago

They already failed once with Swift for Tensorflow, so I am currently curious if there will be some lessons learned from that effort.

For the time being, my chips are still on the Julia horse.

chrislattner2y ago

I was responsible for the S4TF effort at Google. In my opinion, it validated that some of the ideas are good (e.g. Graph Program Extraction is the algorithm that torch dynamo uses internally), that an efficient compiled language has benefits etc. However, I also learned that it should not be based on Swift and should not be based on TensorFlow. Other than those two things, everything is great ;-)

More on GPE if you're curious: https://llvm.org/devmtg/2018-10/slides/Hong-Lattner-SwiftFor...

1 more reply

ElectronCharge2y ago

I’m a huge Julia fan, you can take a look at my posting history. I love Julia’s syntax, and some of its language ideas.

…BUT…

For my personal tastes, Mojo’s lack of garbage collection, Rust-like memory safety, and attention to ahead-of-time compilation put it way ahead. The vast pool of Python developers who can easily pick it up if interested is a big plus.

Julia is aimed at a somewhat different space, but there’s also a huge overlap.

Let’s hope for good interoperability between the two, it seems fairly straightforward…

1 more reply

frakt0x902y ago· 3 in thread

At least they included numpy in this one. On their last post, after all their optimizations, numpy.matmul() produced almost the exact same throughput as their most optimized example. Would still need to dig in to see if this one has issues. Benchmarks are always such a minefield.

Certhas2y ago

matmul is a wrapper for BLAS. If you're faster than BLAS you're beating handwritten assembler code specialized per CPU architecture.

aidenn02y ago

But people use numpy for matrix multiplies in Python. Unless they are claiming to be 35k times faster on general-purpose code, the 35k number is absurd.

1 more reply

archgoon2y ago

Right; but the point is that the optimizations didn't require an entirely new language; you just take the core logic and write it in an existing language that has decades of optimizations. If you're doing math; there's likely a natural, well defined interface that can be used, so you just call that interface from Python, which has historically always been the point of 'glue' languages :)

thebigspacefuck2y ago· 3 in thread

They lost me with the emoji for file extension. That’s not a world I want to live in.

Shorel2y ago

This, while being an apparently superfluous complaint, would be important for eventual enterprise adoption.

Other languages have failed for less visible reasons.

ElectronCharge2y ago

You don’t have to, “.mojo” is equivalent.

queuebert2y ago

But I use DOS...

1 more reply

zengid2y ago· 2 in thread

I'm pretty excited about Mojo and have been keeping an eye on it's development. I feel like the team has learned a lot from their experience, and are taking the best from languages like Python, Rust, Swift, Hylo (Formerly known as Val), and are taking a really nice pragmatic approach in implementing them so that the language is approachable, but also very safe and fast. Once it's out, I hope someone sits down and makes a SwiftUI-like cross platform UI library with it ;).

barnabee2y ago

Yeah, I've been following and am interested too.

Actually more interested in things like UIs, quick API servers, stuff like that than the AI/ML use cases. The idea of most of the ease and approachability of Python, a proper type system, and access to the entire ecosystem of Python libs in a compiled language is pretty compelling.

zengid2y ago

I agree, I'm excited to use it as a General Purpose language, and see how far the Autotuning feature can go for just normal old apps and servers.

dist-epoch2y ago· 1 in thread

Cool, but it has very little to do with Python, except some similar looking syntax.

So for a Python programmer with a performance problem, it doesn't look like a solution.

barnabee2y ago

They are also building in pretty serious Python interop. You should be able to at least somewhat mix the two or migrate gradually, and still use Python libs for less performance critical code (or if the libs do their performance critical stuff in C++ or whatever and are therefore fast enough).

spencerchubb2y ago· 1 in thread

Why is this a language superset of python rather than a python library? Genuinely asking and not trying to bash

mrfox3212y ago

That sounds intractable.

How would you differentiate mojo code from vanilla python without a ton of boilerplate at language boundaries.

deepsquirrelnet2y ago· 1 in thread

I don’t understand this from a goals perspective. What is an “AI compiler” - and why aren’t they comparing benchmarks with technologies more commonly used in AI?

I think I should be impressed, but I feel like I’m missing the point.

bjourne2y ago

I guess the point is that getting the same performance in most other languages requires hundreds of lines of code. Here they are ostensibly achieving that performance using very succinct code. That is pretty nice especially if it integrates well with Python.

pjmlp2y ago

Still waiting if all of this will be another Swift for Tensorflow, or actually make a difference.

1 more reply

erichocean2y ago

Mojo needs to demonstrate Hugging Face's AI libraries with Mojo acceleration. Nothing else will have the kind of impact that would have.

Throw a half dozen engineers at it, develop a deployment plan for SD XL, profit.

You'll get a ton of open source developers working on improving the Mojo versions even further once you release it, researchers developing extensions, etc. GO TO WHERE THE DEVELOPERS ARE.

Stable Diffusion is crazy compute heavy, so if Mojo is what it's purported to be, it should be possible to get speedups.

brrrrrm2y ago

I just want to see real un-hyped benchmarks. Comparing random Python native code makes no sense and seems dishonest, deterring me from actually trying out the tool.

I want a Python that can statically plan underlying GPU allocations, avoids CUDA kernel dispatch overhead and enables a multi-GPU API that isn't some multiprocessing abomination.

two_handfuls2y ago

A Python with easy-to-use SIMD and multithreading sounds awesome!

queuebert2y ago

As a high-performance computing person, I'm usually I/O bound, not compute bound. I wish someone would come up with a 10x speed up for disk and network I/O.

pantsforbirds2y ago

Good blog post. I do wonder how it would do compare to an implementation of pycuda.

laweijfmvo2y ago

nit: The text says 743x but the graph (Figure 3) shows 527x

j / k navigate · click thread line to collapse

64 comments

46 comments · 17 top-level

leecarraher2y ago· 5 in thread

35Kx speedup is not scaled speedup. Throw this, naively parallelizable task at a bigger computer and get 70kx speedup, etc.

ElectronCharge2y ago

> 35Kx speed up is not scaled speed up.

Right. However, this is a comparison versus Python and the GIL, which can’t do that at all.

It doesn’t need to be, any more than writing Numba or Pyrex is done on a large scale.

The appeal is having a high level language that compiles to efficient machine (and GPU!) code. One can “drop down” to Python for non performance intensive parts.

Mojo will also be a worthy competitor to Julia in the HPC scientific arena I think…we’ll see!

catgary2y ago

Have you played with Mojo? It really doesn’t feel high level.

1 more reply

leecarraher2y ago

> Right. However, this is a comparison versus Python and the GIL, which can’t do that at all.

1 more reply

ElectronCharge2y ago

> At the end of the day, mojo's pay to play programming language harkens back to the early 90's Borland days.

leecarraher2y ago

dandiep2y ago· 5 in thread

I don't understand the play here for Modular. If this is a worthwhile improvement that is broadly applicable, won't it at some point make it's way into Python, numpy, etc?

Performance is a huge focus in Python and ML lands right now, so why would this be any different?

sebzim45002y ago

They aren't just speeding up existing python code, they are making a superset of python which has additional performance features.

I guess it's possible that these features will be introduced into cpython etc. but I doubt it.

jlokier2y ago

gojomo2y ago

If they've figured out how to deliver performance that Python might get around to in 5-10y, shouldn't they tout that, for people who might want that now?

dandiep2y ago

Totally, just trying to understand why this is a $100MM of VC money investment. Is the market that big for this? (Honest question)

3 more replies

pjmlp2y ago

Plenty of OpenJDK alternatives still exist, just like there are several C and C++ compilers.

Kelteseth2y ago· 5 in thread

So TL;DR: Using SIMD and multithreading is faster than doing no optimization in python. The only real comparison here is when not doing any optimization is:

> The above code produced a 90x speedup over Python and a 15x speedup over NumPy as shown in the figure below:

Am I missing something?

melling2y ago

Getting >10x speed up isn’t exciting enough for many people?

I’ll take it.

This is all pretty impressive if I can take my unmodified (slightly modified?) Python code and get that sort of improvement.

mathisfun1232y ago

> This is all pretty impressive if I can take my unmodified (slightly modified?) Python code and get that sort of improvement.

2 more replies

brucethemoose22y ago

> no optimization in python

Well, isn't that most Python? If Mojo can pave over the slow interpreted bits I repeatedly dig up in Python profilers, even well maintained projects, with no code changes, that would be huge.

anonymouse0082y ago

So does this mean Swift and Metal offers the same if not better performance enhancements? SIMD is very much a first class citizen as a type there

ElectronCharge2y ago

No, Lattner learned from Swift and is avoiding anything except zero-cost abstractions.

Also, Swift isn’t very interesting outside the Apple ecosystem, and Metal doesn’t exist outside the Apple ecosystem. Mojo has a real shot at widespread, general-purpose, language adoption!

CoreyFieldensOP2y ago· 3 in thread

pjmlp2y ago

They already failed once with Swift for Tensorflow, so I am currently curious if there will be some lessons learned from that effort.

For the time being, my chips are still on the Julia horse.

chrislattner2y ago

More on GPE if you're curious: https://llvm.org/devmtg/2018-10/slides/Hong-Lattner-SwiftFor...

1 more reply

ElectronCharge2y ago

I’m a huge Julia fan, you can take a look at my posting history. I love Julia’s syntax, and some of its language ideas.

…BUT…

Julia is aimed at a somewhat different space, but there’s also a huge overlap.

Let’s hope for good interoperability between the two, it seems fairly straightforward…

1 more reply

frakt0x902y ago· 3 in thread

Certhas2y ago

matmul is a wrapper for BLAS. If you're faster than BLAS you're beating handwritten assembler code specialized per CPU architecture.

aidenn02y ago

But people use numpy for matrix multiplies in Python. Unless they are claiming to be 35k times faster on general-purpose code, the 35k number is absurd.

1 more reply

archgoon2y ago

thebigspacefuck2y ago· 3 in thread

They lost me with the emoji for file extension. That’s not a world I want to live in.

Shorel2y ago

This, while being an apparently superfluous complaint, would be important for eventual enterprise adoption.

Other languages have failed for less visible reasons.

ElectronCharge2y ago

You don’t have to, “.mojo” is equivalent.

queuebert2y ago

But I use DOS...

1 more reply

zengid2y ago· 2 in thread

barnabee2y ago

Yeah, I've been following and am interested too.

zengid2y ago

I agree, I'm excited to use it as a General Purpose language, and see how far the Autotuning feature can go for just normal old apps and servers.

dist-epoch2y ago· 1 in thread

Cool, but it has very little to do with Python, except some similar looking syntax.

So for a Python programmer with a performance problem, it doesn't look like a solution.

barnabee2y ago

spencerchubb2y ago· 1 in thread

Why is this a language superset of python rather than a python library? Genuinely asking and not trying to bash

mrfox3212y ago

That sounds intractable.

How would you differentiate mojo code from vanilla python without a ton of boilerplate at language boundaries.

deepsquirrelnet2y ago· 1 in thread

I don’t understand this from a goals perspective. What is an “AI compiler” - and why aren’t they comparing benchmarks with technologies more commonly used in AI?

I think I should be impressed, but I feel like I’m missing the point.

bjourne2y ago

pjmlp2y ago

Still waiting if all of this will be another Swift for Tensorflow, or actually make a difference.

1 more reply

erichocean2y ago

Mojo needs to demonstrate Hugging Face's AI libraries with Mojo acceleration. Nothing else will have the kind of impact that would have.

Throw a half dozen engineers at it, develop a deployment plan for SD XL, profit.

You'll get a ton of open source developers working on improving the Mojo versions even further once you release it, researchers developing extensions, etc. GO TO WHERE THE DEVELOPERS ARE.

Stable Diffusion is crazy compute heavy, so if Mojo is what it's purported to be, it should be possible to get speedups.

brrrrrm2y ago

I just want to see real un-hyped benchmarks. Comparing random Python native code makes no sense and seems dishonest, deterring me from actually trying out the tool.

I want a Python that can statically plan underlying GPU allocations, avoids CUDA kernel dispatch overhead and enables a multi-GPU API that isn't some multiprocessing abomination.

two_handfuls2y ago

A Python with easy-to-use SIMD and multithreading sounds awesome!

queuebert2y ago

As a high-performance computing person, I'm usually I/O bound, not compute bound. I wish someone would come up with a 10x speed up for disk and network I/O.

pantsforbirds2y ago

Good blog post. I do wonder how it would do compare to an implementation of pycuda.

laweijfmvo2y ago

nit: The text says 743x but the graph (Figure 3) shows 527x

j / k navigate · click thread line to collapse