A tail calling interpreter for Python (already landed in CPython) (opens in new tab)

(blog.reverberate.org)

124 pointsphsilva1y ago92 comments

92 comments

37 comments · 7 top-level

thunkingdeep1y ago· 12 in thread

This does NOT mean Python will get Tail Call Optimization, as Guido cannot be shown The Light, and has decided.

It is not an optimization ; it changes program semantics - converts programs that will run out of stack eventually regardless of the amount of available memory (and raise exceptions an the process, for example, which a program might rely on. Either way, semantics are changed)

It should only be called Tail Call Elimination.

dragonwriter1y ago

By that standard, any optimization that changes scaling in any dimension changes semantics, which, well, I’m not saying its wrong, but I would say it is exactly what people looking for optimization want.

3 more replies

flakes1y ago

> converts programs that will run out of stack eventually regardless of the amount of available memory (and raise exceptions an the process, for example, which a program might rely on

https://xkcd.com/1172/

rpcope11y ago

That's probably one of the more frustrating things about Python. Each release it gets all sorts of questionable new syntax (including the very strange pattern matching "feature" that kind of sucks compared to something like Erlang or Scala), but we never get real useful quality of life improvements for basic functional programming like TCO or multi line lambdas

throwaway815231y ago

Python has always been unashamedly imperative, with some functional features entering by slipping through the cracks. The pattern matching thing seemed ok to me when I tried it, but I haven't used it except briefly, since I'm still mostly on Python 3.9. Interestingly, Python has been losing users to Rust. I don't entirely understand that, other than everyone saying how Rust's tooling is so much better.

3 more replies

dragonwriter1y ago

> we never get real useful quality of life improvements for basic functional programming like TCO or multi line lambdas

A lambda can be as big of an expression as you want, including spanning multiple lines; it can't (because it is an expression) include statements, which is only different than lambdas in most functional languages in that Python actually has statements.

1 more reply

jgalt2121y ago

The utility value of multi-line lambdas is real, but the readability of these is terrible. And Python prizes readability. So you know where this initiative will end up.

2 more replies

pinoy4201y ago

The choice of “unique” verbs is weird too. Case match. Try except?

1 more reply

ehsankia1y ago

Guido is no longer BDFL though, it's the steering committee that decides.

thunkingdeep1y ago

Ah, you’re correct. My comment was mainly meant as a tongue in cheek remark to point out that this definition of tailcall is wholly separate from Python function objects and merely an implementation detail.

riffraff1y ago

the steering committee seems way less conservative than Guido, right?

Looking at python from the outside a lot of changes since GvR stepped down seem like stuff he'd not have been fond of.

2 more replies

coldtea1y ago

Hasn't Guido step down from BD anyway?

VWWHFSfQ1y ago· 11 in thread

Will Python ever get fast? Or even _reasonably_ fast?

The answer is no, it will not. Instead they'll just keep adding more and more syntax. And more and more ways to do the same old things. And they'll say that if you want "fast" then write a native module that we can import and use.

So then what's the point? Is Python really just a glue language like all the rest?

maxwelljoslyn1y ago

VWWHFSfQ, you may already know this, but: I recommend this talk by Armin Ronacher (Flask creator) on how Python's implementation internals contribute to the difficulties of making Python faster.

https://www.youtube.com/watchv=qCGofLIzX6g

One case study Ronacher gets into is the torturous path taken through the Python interpreter (runtime?) when you evaluate `__add__`. Fascinating stuff.

cudder1y ago

Your link is broken, here's a working one: https://www.youtube.com/watch?v=qCGofLIzX6g

IgorPartola1y ago

Python is fast enough for a whole set of problems AND it is a pretty, easy to read and write language. I do think it can probably hit pause on adding more syntax but at least everything it adds is backwards compatible. You won’t be writing a 3D FPS game engine in Python but you definitely can do a whole lot of real time data processing, batch processing, scientific computing, web and native applications, etc. before you need to start considering a faster interpreter.

If your only metric for a language is speed then nothing really beats hand crafted assembly. All this memory safety at runtime is just overhead. If you also consider language ergonomics, Python suddenly is not a bad choice at all.

sieve1y ago

> If your only metric for a language is speed then nothing really beats hand crafted assembly

Only if you know the micro-architecture of the processor you are running on at great depth and can schedule the instructions accordingly. Modern compilers and vms can do crazy stuff at this level.

> Python is fast enough for a whole set of problems AND it is a pretty, easy to read and write language.

It is definitely easy to read. But speed is debatable. It is slow enough for my workload to start wondering about moving to pypy.

1 more reply

VWWHFSfQ1y ago

I guess I'm wondering what is the point of tail-call optimizations, or even async/await when it's all super slow and bounded by the runtime itself? There are basically no improvements whatsoever to the core cpython runtime. So really what is all this for? Some theoretical future version of Python that can actually use these features in an optimal way?

1 more reply

podunkPDX1y ago

> You won’t be writing a 3D FPS game engine in Python

While Eve Online isn’t an FPS, it is an MMORPG written in stackless Python, and seems to be doing OK.

2 more replies

vrighter1y ago

everything it adds is by default backwards compatible, because old programs didn't use it, because it wasn't there yet, and so won't break.

Python's problem is that the non-new stuff is not always backwards compatible. It happens way too often that A new python version comes out and half the python programs on my system just stop working.

mattbillenstein1y ago

The JIT will improve - you can also use PyPy to get speedups on programs that don't use a ton of C extensions.

Also, free-threading is coming so we'll have threads soon.

I don't know if Python can every really be fast as by design, objects are scattered all over memoryand even things like iterating a list, you're chasing pointers to PyObject all over the place - it's just not cache friendly.

olau1y ago

PyPy has a list implementation that specializes under the hood. So if you stuff it with integers, it will contain the integers directly instead of pointers to them. That's at least how I understood it.

1 more reply

tcoff911y ago

I think if you want python but fast then Mojo is your only hope.

EDIT: yes and there’s pypy as well as pointed out below. Basically you gotta use an alternative python implementation of some kind.

pansa21y ago

There’s always PyPy - it’s much faster than CPython and, unlike Mojo, is ready to use today.

riffraff1y ago· 5 in thread

How does this differ from direct threading interpreters?

It seems like it solves the same problem (saving the function call overhead) and has the same downsides (requires non-standard compiler extensions)

EDIT: it seems the answer is that compilers do not play well with direct-threaded interpreters and they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks

http://lua-users.org/lists/lua-l/2011-02/msg00742.html

noelwelsh1y ago

Unfortunately, most discussion of direct threaded interpreters confuses the implementation techniques (e.g. computed gotos) with the concepts (tail calls, or duality between calls and returns and data and codata, depending on your point of view). What is presented here is conceptually a direct threaded interpreter. It's just implemented in a way that is more amenable to optimization by the compiler technology in use.

(More here: https://noelwelsh.com/posts/understanding-vm-dispatch/)

haberman1y ago

This is a great summary. When Mike wrote the message you linked, his conclusion was that you have to drop to assembly to get reasonable code for VM interpreters. Later we developed the "musttail" technique which was able to match his assembly language sequences using C. This makes C a viable option for VM interpreters, even if you want best performance, as long as your compiler supports musttail.

> they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks

It's not the size of the function that is the primary problem, it is the fully connected control flow that gums everything up. The register allocator is trying to dynamically allocate registers through each opcode's implementation, but it also has to connect the end of every opcode with the beginning of every opcode, from a register allocation perspective.

The compiler doesn't understand that every opcode has basically the same set of "hot" variables, which means we benefit from keeping those hot variables in a fixed set of registers basically all of the time.

With tail calls, we can communicate a fixed register allocation to the compiler through the use of function arguments, which are always passed in registers. When we pass this hot data in function arguments, we force the compiler to respect this fixed register allocation, at least at the beginning and the end of each opcode. Given that constraint, the compiler will usually do a pretty good job of maintaining that register allocation through the entire function.

10000truths1y ago

I feel like using calling conventions to massage the compiler's register allocation strategy is a hack. If the problem is manual control over register allocation, then the ideal solution should be... well, exactly that and no more? An annotation for local variables indicating "always spill this" (for cold-path locals) or "never spill this or else trigger a build error" (for hot-path locals). Isn't that literally why the "register" keyword exists in C? Why don't today's C compilers actually use it?

1 more reply

riffraff1y ago

thanks for the explanation!

coldtea1y ago

>and has the same downsides (requires non-standard compiler extensions)

It's not a downside if:

(a) you have those non-standard compiler extensions in the platforms you target

saidinesh51y ago· 2 in thread

Recent discussion: https://news.ycombinator.com/item?id=42999672

Do check out the articles in the top most comment.. about how tail call optimization gets you faster interpreters.

It completely eliminates the overhead of function calls in the generated machine code while you still your code modularly using functions.

haberman1y ago

Yes, that is the same article linked in the first sentence of this "update" article. :)

I published this technique four years ago, and it's very exciting to see that others have taken up the cause and done the work to land it in CPython.

nine_k1y ago

I think this technique is known since 1970s as "direct threaded code".

__s1y ago

`return goto f()` syntax in C seems interesting

I had a similiar idea that Python could have `return from f()` to support tail calls without the issues raised about implicit tail calls

dammaj1y ago

To read about the basics of tail calls optimization:

https://blog.reverberate.org/2021/04/21/musttail-efficient-i...

asicsp1y ago

See also this little bit of discussion about a week back: https://news.ycombinator.com/item?id=42999672

j / k navigate · click thread line to collapse

92 comments

37 comments · 7 top-level

thunkingdeep1y ago· 12 in thread

This does NOT mean Python will get Tail Call Optimization, as Guido cannot be shown The Light, and has decided.

beagle31y ago

It should only be called Tail Call Elimination.

dragonwriter1y ago

3 more replies

flakes1y ago

> converts programs that will run out of stack eventually regardless of the amount of available memory (and raise exceptions an the process, for example, which a program might rely on

https://xkcd.com/1172/

rpcope11y ago

throwaway815231y ago

3 more replies

dragonwriter1y ago

> we never get real useful quality of life improvements for basic functional programming like TCO or multi line lambdas

1 more reply

jgalt2121y ago

The utility value of multi-line lambdas is real, but the readability of these is terrible. And Python prizes readability. So you know where this initiative will end up.

2 more replies

pinoy4201y ago

The choice of “unique” verbs is weird too. Case match. Try except?

1 more reply

ehsankia1y ago

Guido is no longer BDFL though, it's the steering committee that decides.

thunkingdeep1y ago

riffraff1y ago

the steering committee seems way less conservative than Guido, right?

Looking at python from the outside a lot of changes since GvR stepped down seem like stuff he'd not have been fond of.

2 more replies

coldtea1y ago

Hasn't Guido step down from BD anyway?

VWWHFSfQ1y ago· 11 in thread

Will Python ever get fast? Or even _reasonably_ fast?

So then what's the point? Is Python really just a glue language like all the rest?

maxwelljoslyn1y ago

VWWHFSfQ, you may already know this, but: I recommend this talk by Armin Ronacher (Flask creator) on how Python's implementation internals contribute to the difficulties of making Python faster.

https://www.youtube.com/watchv=qCGofLIzX6g

One case study Ronacher gets into is the torturous path taken through the Python interpreter (runtime?) when you evaluate `__add__`. Fascinating stuff.

cudder1y ago

Your link is broken, here's a working one: https://www.youtube.com/watch?v=qCGofLIzX6g

IgorPartola1y ago

sieve1y ago

> If your only metric for a language is speed then nothing really beats hand crafted assembly

Only if you know the micro-architecture of the processor you are running on at great depth and can schedule the instructions accordingly. Modern compilers and vms can do crazy stuff at this level.

> Python is fast enough for a whole set of problems AND it is a pretty, easy to read and write language.

It is definitely easy to read. But speed is debatable. It is slow enough for my workload to start wondering about moving to pypy.

1 more reply

VWWHFSfQ1y ago

1 more reply

podunkPDX1y ago

> You won’t be writing a 3D FPS game engine in Python

While Eve Online isn’t an FPS, it is an MMORPG written in stackless Python, and seems to be doing OK.

2 more replies

vrighter1y ago

everything it adds is by default backwards compatible, because old programs didn't use it, because it wasn't there yet, and so won't break.

mattbillenstein1y ago

The JIT will improve - you can also use PyPy to get speedups on programs that don't use a ton of C extensions.

Also, free-threading is coming so we'll have threads soon.

olau1y ago

1 more reply

tcoff911y ago

I think if you want python but fast then Mojo is your only hope.

EDIT: yes and there’s pypy as well as pointed out below. Basically you gotta use an alternative python implementation of some kind.

pansa21y ago

There’s always PyPy - it’s much faster than CPython and, unlike Mojo, is ready to use today.

riffraff1y ago· 5 in thread

How does this differ from direct threading interpreters?

It seems like it solves the same problem (saving the function call overhead) and has the same downsides (requires non-standard compiler extensions)

http://lua-users.org/lists/lua-l/2011-02/msg00742.html

noelwelsh1y ago

(More here: https://noelwelsh.com/posts/understanding-vm-dispatch/)

haberman1y ago

> they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks

10000truths1y ago

1 more reply

riffraff1y ago

thanks for the explanation!

coldtea1y ago

>and has the same downsides (requires non-standard compiler extensions)

It's not a downside if:

(a) you have those non-standard compiler extensions in the platforms you target

saidinesh51y ago· 2 in thread

Recent discussion: https://news.ycombinator.com/item?id=42999672

Do check out the articles in the top most comment.. about how tail call optimization gets you faster interpreters.

It completely eliminates the overhead of function calls in the generated machine code while you still your code modularly using functions.

haberman1y ago

Yes, that is the same article linked in the first sentence of this "update" article. :)

I published this technique four years ago, and it's very exciting to see that others have taken up the cause and done the work to land it in CPython.

nine_k1y ago

I think this technique is known since 1970s as "direct threaded code".

__s1y ago

`return goto f()` syntax in C seems interesting

I had a similiar idea that Python could have `return from f()` to support tail calls without the issues raised about implicit tail calls

dammaj1y ago

To read about the basics of tail calls optimization:

https://blog.reverberate.org/2021/04/21/musttail-efficient-i...

asicsp1y ago

See also this little bit of discussion about a week back: https://news.ycombinator.com/item?id=42999672

j / k navigate · click thread line to collapse