How to make a fast dynamic language interpreter (opens in new tab)

(zef-lang.dev)

247 pointspizlonator19d ago57 comments

57 comments

In a similar vein, see this page about the performance of the interpreter for the dynamic language Wren: https://wren.io/performance.html

Unlike the Zef article, which describes implementation techniques, the Wren page also shows ways in which language design can contribute to performance.

In particular, Wren gives up dynamic object shapes, which enables copy-down inheritance and substantially simplifies (and hence accelerates) method lookup. Personally I think that’s a good trade-off - how often have you really needed to add a method to a class after construction?

versteegen19d ago

Yes, language design is a hugely important determinant of interpreter or JIT speed. There are many highly optimised VMs for dynamic languages but LuaJIT is king because Lua is such a small and suitable language, and although it does have a couple difficult to optimise features, they are few enough that you can expend the effort. It's nothing like Python. It's not much of an exaggeration to say Python is designed to minimise the possibility of a fast JIT, with compounding layers of dynamism. After years of work, the CPython 3.15 JIT finally managed ~5% faster than the stock interpreter on x86_64.

pjmlp19d ago

CPython current state is more a reflection of resources spent, than what is possible.

See experience with Smalltalk and Self, where everything is dynamic dispatch, everything is an object, in a live image that can be monkey patched at any given second.

PyPy and GraalPy, and the oldie IronPython, are much better experiences than where CPython currently stands on.

dec0dedab0de18d ago

The problem is that AI has been dominating the conversation for so many years, and they'll get more improvements from removing the GIL than they would from adopting the PyPy JIT.

The JIT would help everyone else more than removing the GIL, I wish PyPy became the reference implementation during 2.7

1 more reply

dontlaugh19d ago

Python is worse, but not by all that much. After all, PyPy has been several times faster for many years.

psychoslave19d ago

That’s basically what is done all the time in languages where monkey patching is accepted as idiomatic, notably Ruby. Ruby is not known for its speed-first mindset though.

On the other side, having a type holding a closed set of applicable functions is somehow questioning.

There are languages out there that allows to define arbitrary functions and then use them as a methods with dot notation on any variable matching the type of the first argument, including Nim (with macros), Scala (with implicit classes and type classes), Kotlin (with extension functions) and Rust (with traits).

pjmlp19d ago

It is getting better, now that they finally got the Smalltalk lessons from 1984.

"Efficient implementation of the smalltalk-80 system"

https://dl.acm.org/doi/10.1145/800017.800542

igouy18d ago

> Ruby is not known for its speed-first mindset though.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

IshKebab19d ago

> Ruby is not known for its speed-first mindset though.

Or its maintainability, and this is one of the big reasons why. Methods and variables are dynamically generated at runtime which makes it impossible to even grep for them. If you have a large Ruby codebase (say Gitlab or Asciidoctor), it can be almost impossible to trace through code unless you are familiar with the entire codebase.

Their "answer" is that you run the code and use the debugger, but that's clearly ridiculous.

So I would say dynamically defined classes is not only bad for performance; it's just bad in general.

psychoslave18d ago

That's yet an other topic, as monkey patching can definitely be explicit in ruby. The dynamically generated things at runtime are generally through the catch all method missing facility that can be overwritten. This can also be done in, say, PHP. It just that the community is less fond of it. Not sure about what most popular ahead of time oriented languages expose as facility in this area, obviously one can always even decide to generate automodifying executable. There is nothing special about ruby when it comes to go into forbidden realms, except maybe it doesn't come to much in your way when you try to express something, even if that is not the most maintenance friendly path.

naasking19d ago

> In particular, Wren gives up dynamic object shapes, which enables copy-down inheritance and substantially simplifies (and hence accelerates) method lookup.

A general rule of thumb is that if you can assign an expression a static type, then you can compile it fairly efficiently. Complex dynamic languages obviously actively fight this in numerous ways, and so end up being difficult to optimize. Seems obvious in retrospect.

jiusanzhou19d ago

The jump from change #5 to #6 (inline caches + hidden-class object model) doing the bulk of the work here really tracks with how V8/JSC got fast historically — dynamic dispatch on property access is where naive interpreters die, and everything else is kind of rounding error by comparison. Nice that it's laid out so you can see the contribution of each step in isolation; most perf writeups just show the final number.

jimmypk19d ago

@jiusanzhou The interesting implementation detail in change #6 is how the inline caching is done in an AST-walking interpreter specifically. In bytecode interpreters, IC rewriting is natural — the "cache site" is a stable byte offset in the bytecode stream you can patch. Here, the cache site is an AST node, so @pizlonator uses placement new to construct a specialized AST node on top of the generic one in-place (via constructCache<>). It's self-modifying code at the AST level.

The tradeoff is that this requires mutable AST nodes, which conflicts with the immutable-AST assumption most compilers rely on (e.g., for sharing subtrees or parallelizing compilation). For a single-threaded interpreter it works cleanly, but it'd be a problem if you wanted to JIT-compile from the same AST on a background thread while the interpreter is mutating nodes.

Someone19d ago

I agree, but there’s a tiny caveat that this is for one specific benchmark that, I think, doesn’t reflect most real-world code.

I’m basing that on the 1.6% improvement they got on speeding up sqrt. That surprised me, because, to get such an improvement, the benchmark must spend over 1.6% of its time in there, to start with.

Looking in the git repo, it seems that did happen in the nbody simulation (https://github.com/pizlonator/zef/blob/master/ScriptBench/nb...).

pizlonatorOP18d ago

Before that specialization, sqrt calls were hilariously slow - so even calling it sparingly could significantly impact performance.

Basically the flow was:

- check if we’re calling a method of an object

- nope, ok, so cascade through 10+ symbol comparisons

- sqrt was towards the bottom of the cascade

grg019d ago

Interesting, thanks for sharing. It is a topic I'd like to explore in detail at some point.

I also like how, according to Github, the repo is 99.7% HTML and 0.3% C++. A testament to the interpreter's size, I guess?