Reflections on 2 years of CPython's JIT Compiler (opens in new tab)

(fidget-spinner.github.io)

79 pointsbratao11mo ago96 comments

96 comments

48 comments · 9 top-level

eigenspace11mo ago· 16 in thread

It turns out that if you have language semantics that make optimizations hard, making a fast optimizing compiler is hard. Who woulda thunk?

To be clear, this seems like a cool project and I dont want to be too negative about it, but i just think this was an entirely foreseeable outcome, and the amount of people excited about this JIT project when it was announced shows how poorly a lot of people understand what goes into making a language fast.

jerf11mo ago

I was active in the Python community in the 200x timeframe, and I daresay the common consensus is that language didn't matter and a sufficiently smart compiler/JIT/whatever would eventually make dynamic scripting languages as fast as C, so there was no reason to learn static languages rather than just waiting for this to happen.

It was not universal. But it was very common and at least plausibly a majority view, so this idea wasn't just some tiny minority view either.

I consider this idea falsified now, pending someone actually coming up with a JIT/compiler/whatever that achieves this goal. We've poured millions upon millions of dollars into the task and the scripting languages still are not as fast as C or static languages in general. These millions were not wasted; there were real speedups worth having, even if they are somewhat hard on RAM. But they have clearly plateaued well below "C speed" and there is currently no realistic chance of that happening anytime soon.

Some people still have not noticed that the idea has been falsified and I even occasionally run into someone who thinks Javascript actually is as fast as C in general usage. But it's not and it's not going to be.

amval11mo ago

> I was active in the Python community in the 200x timeframe, and I daresay the common consensus is that language didn't matter and a sufficiently smart compiler/JIT/whatever would eventually make dynamic scripting languages as fast as C, so there was no reason to learn static languages rather than just waiting for this to happen.

To be very pedantic, the problem is not that these are dynamic languages _per se_, but that they were designed with semantics unconcerned with performance. As such, retrofitting performance can be extremely challenging.

As a counterexample of fast and dynamic: https://julialang.org/ (of course, you pay the prize in other places)

I agree with your comment overall, though.

2 more replies

emtel11mo ago

This is all true, but there's another angle that often gets missed:

JITs are really only ideal for request-processing systems, in which a) memory is abundant b) the same code paths run over and over and over again, and c) good p99 latency is usually the bar.

In contrast, in user facing apps, you usually find that a) memory is constrained b) lots of code runs rarely or in some cases once (e.g. the whole start-up path) c) what would be considered good p99 latency for a server can translate to pretty bad levels of jank.

JITs can't do anything if the code you care about runs rarely and causes a frame skip every time you hit it, either because the JIT hasn't triggered yet due to too-few samples, or the generated code has been evicted from the JIT cache because you don't have memory to spare. And if you have code that needs to run fast _every_ time it runs, the easiest way to do that is to start with fast code already compiled and ready to execute.

We saw this play out when android moved from Dalvik (JIT) to ART (AoT compilation). Apple figured this out years earlier.

Of course it's not that there are no highly performant apps built on JIT runtimes. But it's a significant headwind.

(Most of the above applies equally to tracing GC, btw)

1 more reply

yxhuvud11mo ago

While what you say is true, there is still a huge gap between the performance of javascript (and even Ruby) and that of Python. The efforts to optimize Python are lagging behind, so there is a lot of things that still can be made faster.

2 more replies

morkalork11mo ago

I remember this from the early 2010s "compilation of a dynamic language is a superset of compilation of static languages ergo we should be able to achieve both optimizations static languages can do and more because there are opportunities that only become apparent at runtime". When really its all about the constraints you can put on the user that set you up for better optimization.

2 more replies

PaulHoule11mo ago

Javascript and Common Lisp aren't as fast as C but they are faster than Python.

1 more reply

sevensor11mo ago

Python is definitely slower for programs that do the same thing. What I see is that the users of fast languages often write programs that do the wrong thing, 30x faster than Python. No free lunch either way

beebmam11mo ago

I don’t understand the sentiment of not wanting to learn a language. LLMs make learning and understanding trivial if the user wants that. I think many of those complaining about strongly typed languages (etc) are lazy. In this new world of AI generated code, strongly typed languages are king

pjmlp11mo ago

Especially when one keeps ignoring the JITs of dynamic languages, that were in the genesis of all high end production JITs being used nowadays, tracing back to Smalltalk, Self, Lisp, Prolog.

All those languages are just as dynamic as Python, more so given the dynamically loading of code with image systems, across network, with break into debugger/condition points and redo workflows.

ngrilly11mo ago

Agreed. I'd like CPython to offer the possibility to opt in semantics that are more amenable to optimizations, similar to what Cider is enabling with their opt-in strict modules and static classes: https://github.com/facebookincubator/cinder.

pizlonator11mo ago

The semantics of Python-the-language aren’t any worse than JavaScript’s for optimization.

Something else is going on.

1 more reply

manypineapples11mo ago

pypy manages

pjmlp11mo ago

The black swan of Python JITs, mostly ignored by the community, unfortunately.

eigenspace11mo ago

Pypy is only "fast" relative to CPython. In all other metrics it's still very slow, and it comes at the cost of making interop with actually fast languages much harder and slower. It's basically useless.

almostgotcaught11mo ago

> It turns out that if you have language semantics that make optimizations hard, making a fast optimizing compiler is hard. Who woulda thunk?

Is this in the article? I don't see Python's semantics mentioned anywhere as a symptom (but I only skimmed).

> shows how poorly a lot of people understand what goes into making a language fast.

...I'm sorry but are you sure you're not one of these people? Some facts:

1. JS is just as dynamic and spaghetti as Python and I hope we're all aware that it has some of the best jits out there;

2. Conversely, C++ has many "optimizing compiler[s]" and they're not all magically great by virtue of compiling a statically typed, rigid language like C++.

o11c11mo ago

JS is absolutely not as dynamic as Python. It supports `const`ness, and uses it by default for classes and functions.

1 more reply

ecshafer11mo ago· 12 in thread

Does anyone know why for example the Ruby team is able to create JITs that are performant with comparative ease to Python? They are in many ways similar languages, but Python has 10x the developers at this point.

dfox11mo ago

Ruby in both its semantics and implementation is very close to smalltalk and does not really use the Python's object model that can be summarized as "everything is a dict with string keys". That makes all the tricks discovered over last 40 years of how to make Smalltalk and Lisp fast much more directly applicable in Ruby.

maxime_cb11mo ago

Instigator of YJIT, the CRuby JIT here.

It's easy to dismiss our efforts, but Ruby is just as dynamic if not more than Python. It's also a very difficult language to optimize. I think we could have done the same for Python. In fact the Python JIT people reached out to me when they were starting this project. They probably felt encouraged seeing our success. However they decided to ignore my advice and go with their own unproven approach.

This is probably going to be an unpopular take but building a good JIT compiler is hard and leadership matters. I started the YJIT project with 10+ years of JIT compiler experience and a team of skilled engineers, whereas AFAIK the Python JIT project was lead by a student. It was an uphill battle getting YJIT to work well at first. We needed grit and I pushed for a very data-driven approach so we could learn from our early failures and make informed decisions. Make of that what you will.

Yes Python is hard to optimize. I Still believe that a good JIT for CPython is very possible but it needs to be done right. Hire me if you want that done :)

Several talks about YJIT on YouTube for those who want to know more: https://youtu.be/X0JRhh8w_4I

3 more replies

pjmlp11mo ago

Smalltalk is highly dynamic, it keeps surprising me Python gets put into some kind of special place as excuse why people keep failing at JIT adoption.

Everything is a message, the meta classes that define object shapes can change any time some feels like it, there are methods like becomes: that completely replaces an object across all its references on the running image, break into debugger and redo after whatever was changed while into the debugger, code loaded over network,....

As if PyPy and GraalPy didn't even exist as well.

Qem11mo ago

> Python's object model that can be summarized as "everything is a dict with string keys".

Given this "it's dicts all the way down" nature of CPython, I'm curious if the recent hash table theoretical breakthrough[1] discussed here[2] a few months ago may eventually help making it much faster, given the compounding of dict upon dict?

[1] https://www.quantamagazine.org/undergraduate-upends-a-40-yea...

[2] https://news.ycombinator.com/item?id=43002511

abhorrence11mo ago

My complete _guess_ (in which I make a bunch of assumptions!) is that generally it seems like the Ruby team has been more willing to make small breaking changes, whereas it seems a lot like the Python folks have become timid in those regards after the decade of transition from 2 -> 3.

gkbrk11mo ago

Python has made many breaking changes after 2->3 as well. They don't even bother to increment the major version number any more.

I haven't checked, but I wouldn't be surprised if more Python versions contained breaking changes than not.

1 more reply

pjmlp11mo ago

Community.

Smalltalk, Self, Lisp, are highly dynamic, their JIT research are the genesis of modern JIT engines.

For some strange reason, Python community rather learns C, calls it "Python", instead of focusing why languages that are just as dynamic, have managed already a few decades ago.

kzrdude11mo ago

Python has a longtime connection to C and somehow I think Python is a dynamic language in the C tradition, not of any other community, maybe the same thing your are hinting.

Hard to put a finger on what exactly, but Python has never been so interested in purity, rather in pragmatic functionality and it ends up in a place where it gives access to C style idioms and API, see for example the os module.

Maybe it is that Python doesn't have it's own model of the world, but it provides a dynamic language facade to the C model of the world.

adgjlsfhk111mo ago

I think a major factor is C API prevalence. The python C-api is bad and widely used so it's very difficult to improve.

maxime_cb11mo ago

Ruby has the same unfortunate problem.

cuchoi11mo ago

Funding?

Seems like the development was funded by Shopify and they got a ~20% performance improvement. https://shopify.engineering/ruby-yjit-is-production-ready

A similar experience in the Python community is that Microsoft funded "Faster CPython" and they made Python 20-40% faster.

ecshafer11mo ago

The funding is one angle, but the Shopify Ruby team isn't that big (<10 people iirc). Python is used extensively at just about every tech company, and Meta, Apple, Microsoft, Alphabet, and Amazon each have at least 10x as many engineers as Shopify. This makes me think that there must be some kind of language/ecosystem reason that makes Python much harder than Ruby to optimize.

1 more reply

pizlonator11mo ago· 4 in thread

JIT and VM writer here. I’m also pretty clued in on how CPython works because I ported it to Fil-C.

I think if I was being paid to make CPython faster I’d spend at least a year changing how objects work internally. The object model innards are simply too heavy as it stands. Therefore, eliminating the kinds of overheads that JITs eliminate (the opcode dispatch, mainly) won’t help since that isn’t the thing the CPU spends much time on when running CPython (or so I would bet).

kzrdude11mo ago

Many changes of that kind have been made by the faster-cpython team I believe, Mark Shannon was rather focused on it (and had a decade of experience of that kind of tweaks to python).

But I'm trying to find/recall a blog post that detailed the different steps in shrinking the CPython object struct...

If you say that's not enough, more radical changes needed, I would understand.

pizlonator11mo ago

Gotta be careful about the tone and mindset here.

Given any arbitrarily optimized thing, it is always possible to optimize it more. And the fact that it's possible to optimize it more is not meant as a criticism of folks who did the previous optimizations.

So, I have no doubt that Mark and others have worked on exactly the thing I'm talking about and that they've gotten wins. And I have no doubt that more can be done. Also, not saying I would do a better job at it than Mark or anyone else

cs_throwaway11mo ago

Do you think it may be feasible to do this and maintain the FFI?

pizlonator11mo ago

That's the hard part!

I think that the FFI makes it super hard to do most of the optimizations I'd want to do. Maybe it makes them impossible even. The game is to find any chance for size reduction and fast path simplification that doesn't upset FFI

ggm11mo ago· 3 in thread

What fundamentals would make the jit, this specific jit faster? Because if it's demonstrably slower, it begs the question if it can be faster or is inherently slower than a decent optimisation path through a compiler.

At this point it's a great didactic tool and a passion project surely? Or, has advantages in other dimensions like runtime size, debugging, and .pyc coverage, or in thread safe code or ...

teruakohatu11mo ago

The article points out they have only begun adding optimisers to the jit compiler.

Unoptimised jit < optimised interpreter (at least in this instance)

They are working on it presumably because they think there will eventually be a speed ups in general or at least for certain popular workloads.

taeric11mo ago

The article also specifically calls out machine code generation as a separate thing. I confess that somewhat surprises me, as I would expect getting machine code generated would be a main source of speed up for a JIT? That and counter based choices on what optimizations to perform?

Still, to directly answer the first question, I would hope even if there wasn't obvious performance improvements immediately, if folks want to work on this, I see no reason not to explore it. If we are lucky, we find improvements we didn't expect.

3 more replies

pizlonator11mo ago

In JavaScript, an unoptimizing JIT (no regalloc, no optimizations that look at patterns of ops, no analysis) is faster than the interpreter because it eliminates opcode dispatch.

Adding more optimizations improves things from there.

But the point is, a JIT can be a speedup just because it isn’t an interpreter (it doesn’t dynamically dispatch ops).

gjvc11mo ago· 2 in thread

not so long ago some people were saying that pypy should be the de-facto reference implementation because of its speed

pjmlp11mo ago

At very least they deserve more acknowledgement of the community, and more adoption.

gjvc11mo ago

absolutely! perhaps their website isn't shiny enough.

serjester11mo ago· 1 in thread

This article doesn't do the best job explaining the broader picture - stability has been their number one priority up to this point.

- Most of the work has just been plumbing. Int/float unboxing, smarter register allocation, free-threaded safety land in 3.15+.

- Most JIT optimizations are currently off by default or only triggers after a few thousand hits, and skips any byte-codes that look risky (profiling hooks, rare ops, etc.).

I really recommend this talk with one of the Microsoft faster Cpython developers for more details, https://www.youtube.com/watch?v=abNY_RcO-BU

kenjin409611mo ago

Hi, author of the post here, stability indeed has been a priority. There are some points which are not exactly the case though:

> - Most of the work has just been plumbing. Int/float unboxing, smarter register allocation, free-threaded safety land in 3.15+.

The first part is true, but for the second sentence: none of that is guaranteed to land in 3.15+. We proposed to land them, that doesn't mean they will. Landing a PR in CPython is subject to maintainer time and reviewer approval, which doesn't always happen. I proposed a few optimizations for 3.14 that never landed.

> Most JIT optimizations are currently off by default or only triggers after a few thousand hits

It is indeed true we only trigger after a few thousand hits, but all optimizations that we currently have are always enabled. We don't sandbag the JIT on purpose.

bgwalter11mo ago· 1 in thread

According to the promises of the Faster CPython Team, the JIT with a >50% speedup should have happened two years ago.

Everyone knows Python is hard to optimize, that's why Mojo also gave up on generality. These claimed 20-30% speedups, apparently made by one of the chief liars who canceled Tim Peters, are not worth it. Please leave Python alone.

notatallshaw11mo ago

Two years ago was Python 3.11, my real world workloads did see a ~15-20% improvement in performance with that release.

I don't remember the Faster CPython Team claiming JIT with a >50% speedup should have happened two years ago, can you provide a source?

I do remember Mark Shannon proposed an aggressive timeline for improving performance, but I don't remember him attributing it to a JIT, and also the Faster CPython Team didn't exist when that was proposed.

> apparently made by one of the chief liars who canceled Tim Peters

Tim Peters still regularly posts on DPO so calling him "cancelled" is a choice: https://discuss.python.org/u/tim.one/activity.

Also, I really can not think who you would be referring to as part of the Faster CPython Team, of which all the former members I am aware of largely stayed out of the discussions on DPO.

throwaway03202311mo ago

I remember when pypy was only 25x slower than c python.

firesteelrain11mo ago

We have had really good success using Cython which makes many calls into the CPython interpreter and CPython Standard Libraries.

j / k navigate · click thread line to collapse

96 comments

48 comments · 9 top-level

eigenspace11mo ago· 16 in thread

It turns out that if you have language semantics that make optimizations hard, making a fast optimizing compiler is hard. Who woulda thunk?

jerf11mo ago

It was not universal. But it was very common and at least plausibly a majority view, so this idea wasn't just some tiny minority view either.

amval11mo ago

As a counterexample of fast and dynamic: https://julialang.org/ (of course, you pay the prize in other places)

I agree with your comment overall, though.

2 more replies

emtel11mo ago

This is all true, but there's another angle that often gets missed:

JITs are really only ideal for request-processing systems, in which a) memory is abundant b) the same code paths run over and over and over again, and c) good p99 latency is usually the bar.

We saw this play out when android moved from Dalvik (JIT) to ART (AoT compilation). Apple figured this out years earlier.

Of course it's not that there are no highly performant apps built on JIT runtimes. But it's a significant headwind.

(Most of the above applies equally to tracing GC, btw)

1 more reply

yxhuvud11mo ago

2 more replies

morkalork11mo ago

2 more replies

PaulHoule11mo ago

Javascript and Common Lisp aren't as fast as C but they are faster than Python.

1 more reply

sevensor11mo ago

beebmam11mo ago

pjmlp11mo ago

Especially when one keeps ignoring the JITs of dynamic languages, that were in the genesis of all high end production JITs being used nowadays, tracing back to Smalltalk, Self, Lisp, Prolog.

All those languages are just as dynamic as Python, more so given the dynamically loading of code with image systems, across network, with break into debugger/condition points and redo workflows.

ngrilly11mo ago

pizlonator11mo ago

The semantics of Python-the-language aren’t any worse than JavaScript’s for optimization.

Something else is going on.

1 more reply

manypineapples11mo ago

pypy manages

pjmlp11mo ago

The black swan of Python JITs, mostly ignored by the community, unfortunately.

eigenspace11mo ago

almostgotcaught11mo ago

> It turns out that if you have language semantics that make optimizations hard, making a fast optimizing compiler is hard. Who woulda thunk?

Is this in the article? I don't see Python's semantics mentioned anywhere as a symptom (but I only skimmed).

> shows how poorly a lot of people understand what goes into making a language fast.

...I'm sorry but are you sure you're not one of these people? Some facts:

1. JS is just as dynamic and spaghetti as Python and I hope we're all aware that it has some of the best jits out there;

2. Conversely, C++ has many "optimizing compiler[s]" and they're not all magically great by virtue of compiling a statically typed, rigid language like C++.

o11c11mo ago

JS is absolutely not as dynamic as Python. It supports `const`ness, and uses it by default for classes and functions.

1 more reply

ecshafer11mo ago· 12 in thread

dfox11mo ago

maxime_cb11mo ago

Instigator of YJIT, the CRuby JIT here.

Yes Python is hard to optimize. I Still believe that a good JIT for CPython is very possible but it needs to be done right. Hire me if you want that done :)

Several talks about YJIT on YouTube for those who want to know more: https://youtu.be/X0JRhh8w_4I

3 more replies

pjmlp11mo ago

Smalltalk is highly dynamic, it keeps surprising me Python gets put into some kind of special place as excuse why people keep failing at JIT adoption.

As if PyPy and GraalPy didn't even exist as well.

Qem11mo ago

> Python's object model that can be summarized as "everything is a dict with string keys".

[1] https://www.quantamagazine.org/undergraduate-upends-a-40-yea...

[2] https://news.ycombinator.com/item?id=43002511

abhorrence11mo ago

gkbrk11mo ago

Python has made many breaking changes after 2->3 as well. They don't even bother to increment the major version number any more.

I haven't checked, but I wouldn't be surprised if more Python versions contained breaking changes than not.

1 more reply

pjmlp11mo ago

Community.

Smalltalk, Self, Lisp, are highly dynamic, their JIT research are the genesis of modern JIT engines.

For some strange reason, Python community rather learns C, calls it "Python", instead of focusing why languages that are just as dynamic, have managed already a few decades ago.

kzrdude11mo ago

Python has a longtime connection to C and somehow I think Python is a dynamic language in the C tradition, not of any other community, maybe the same thing your are hinting.

Maybe it is that Python doesn't have it's own model of the world, but it provides a dynamic language facade to the C model of the world.

adgjlsfhk111mo ago

I think a major factor is C API prevalence. The python C-api is bad and widely used so it's very difficult to improve.

maxime_cb11mo ago

Ruby has the same unfortunate problem.

cuchoi11mo ago

Funding?

Seems like the development was funded by Shopify and they got a ~20% performance improvement. https://shopify.engineering/ruby-yjit-is-production-ready

A similar experience in the Python community is that Microsoft funded "Faster CPython" and they made Python 20-40% faster.

ecshafer11mo ago

1 more reply

pizlonator11mo ago· 4 in thread

JIT and VM writer here. I’m also pretty clued in on how CPython works because I ported it to Fil-C.

kzrdude11mo ago

Many changes of that kind have been made by the faster-cpython team I believe, Mark Shannon was rather focused on it (and had a decade of experience of that kind of tweaks to python).

But I'm trying to find/recall a blog post that detailed the different steps in shrinking the CPython object struct...

If you say that's not enough, more radical changes needed, I would understand.

pizlonator11mo ago

Gotta be careful about the tone and mindset here.

cs_throwaway11mo ago

Do you think it may be feasible to do this and maintain the FFI?

pizlonator11mo ago

That's the hard part!

ggm11mo ago· 3 in thread

At this point it's a great didactic tool and a passion project surely? Or, has advantages in other dimensions like runtime size, debugging, and .pyc coverage, or in thread safe code or ...

teruakohatu11mo ago

The article points out they have only begun adding optimisers to the jit compiler.

Unoptimised jit < optimised interpreter (at least in this instance)

They are working on it presumably because they think there will eventually be a speed ups in general or at least for certain popular workloads.

taeric11mo ago

3 more replies

pizlonator11mo ago

In JavaScript, an unoptimizing JIT (no regalloc, no optimizations that look at patterns of ops, no analysis) is faster than the interpreter because it eliminates opcode dispatch.

Adding more optimizations improves things from there.

But the point is, a JIT can be a speedup just because it isn’t an interpreter (it doesn’t dynamically dispatch ops).

gjvc11mo ago· 2 in thread

not so long ago some people were saying that pypy should be the de-facto reference implementation because of its speed

pjmlp11mo ago

At very least they deserve more acknowledgement of the community, and more adoption.

gjvc11mo ago

absolutely! perhaps their website isn't shiny enough.

serjester11mo ago· 1 in thread

This article doesn't do the best job explaining the broader picture - stability has been their number one priority up to this point.

- Most of the work has just been plumbing. Int/float unboxing, smarter register allocation, free-threaded safety land in 3.15+.

- Most JIT optimizations are currently off by default or only triggers after a few thousand hits, and skips any byte-codes that look risky (profiling hooks, rare ops, etc.).

I really recommend this talk with one of the Microsoft faster Cpython developers for more details, https://www.youtube.com/watch?v=abNY_RcO-BU

kenjin409611mo ago

Hi, author of the post here, stability indeed has been a priority. There are some points which are not exactly the case though:

> - Most of the work has just been plumbing. Int/float unboxing, smarter register allocation, free-threaded safety land in 3.15+.

> Most JIT optimizations are currently off by default or only triggers after a few thousand hits

It is indeed true we only trigger after a few thousand hits, but all optimizations that we currently have are always enabled. We don't sandbag the JIT on purpose.

bgwalter11mo ago· 1 in thread

According to the promises of the Faster CPython Team, the JIT with a >50% speedup should have happened two years ago.

notatallshaw11mo ago

Two years ago was Python 3.11, my real world workloads did see a ~15-20% improvement in performance with that release.

I don't remember the Faster CPython Team claiming JIT with a >50% speedup should have happened two years ago, can you provide a source?

> apparently made by one of the chief liars who canceled Tim Peters

Tim Peters still regularly posts on DPO so calling him "cancelled" is a choice: https://discuss.python.org/u/tim.one/activity.

Also, I really can not think who you would be referring to as part of the Faster CPython Team, of which all the former members I am aware of largely stayed out of the discussions on DPO.

throwaway03202311mo ago

I remember when pypy was only 25x slower than c python.

firesteelrain11mo ago

We have had really good success using Cython which makes many calls into the CPython interpreter and CPython Standard Libraries.

j / k navigate · click thread line to collapse