Note to language designers: Reduce your language's poison pills. For those which can't be avoided, make them easy to profile!
When you redefine something, you stop the world at safepoints and update prior baked assumptions that are registered on the old definition. These redefinitions usually happen during development and initialization, not during performance-sensitive runtime. It would add more to the memory footprint, but that information is not accessed at all except during recompilation.
The proper solution is to not have unnecessary dynamics. There is no need for the ability to redefine methods on an object. If you have functions as first-class objects, you can just let users store functions and call them at the cost of indirection if they wish to have such dynamics.
function map(f, list) {
if (list==null)
return null;
return cons(f(head(list)), map(f, tail(list)));
}
Now as there is a call to map, a JIT compiled version will put in a redefinition check and if the check passes then it can call an optimised map which eg knows it has two arguments and f is a function. If the check fails then a slow path which does a full call is followed.In your scenario there is no redefinition check and instead when one tries to redefine map, any optimised calls must be downgraded to the slow path.
But now imagine calling map with a function that redefines map after say the third recursive call. Then the redefinition downgrading magic needs to somehow modify one version of map halfway through a call so that when f returns it will call the new map, but the calls further up the stack frame don’t need to be modified as they won’t call map again.
> you stop the world at safepoints
But that's how deoptimisation checks work.
Even LuaJIT, which is to my understand the closest anyone has come to this, still was retrofitting JIT onto an existing language.
I'm not convinced we're going to see much more performance out of JS, for instance, which is about as fast as a dynamic language can go nowadays. But I wonder what the real limits of a "dynamic scripting" language would be from this perspective.
Edit: Per my other comment about indirection being a performance poison pill, here's an example of an idea where a new scripting language might be able to get a lot of performance. Suppose you keep the ability to dynamically create classes and load code and so forth, so that (just as an example, not necessarily a good idea) you can write code that dynamically connects to a database and loads in tables as classes with automatically defined properties, etc. But instead of working the way the languages do now, instead of constantly walking through all the layers of indirection that can be used to implement all this at every call site and for every call, what if you could do something like the "pledge()" call that says "OK, this is it, I'm all set up, the dynamism is all done, you may now assume that all the type analysis you've done is now complete". Now the JIT can drop all of its paranoia code. It isn't completely obvious to me how to do this correctly (by implication, if you can "connect to a database" before this pledge()-like call is done, you have to have a pretty complete runtime available just for that), nor is it obvious to me how this would affect the type system, etc. Maybe it's not possible. But it's the type of thing I mean that would be interesting to have examined by someone, who might be able to concretely show why it's not possible, or, whoknows, make it work. And that's just one idea of how to design a dynamic language from the top for performance, basically off the cuff; who knows what a smart person who sat down and thought about this for a couple of weeks before even beginning to code could come up with.
Another one is that it isn't obvious to me that scripting languages must be all based on hash tables that laboriously can be cast back to structs by the JIT; it seems to me that it would also be feasible to go the other way, and allow users to define things as structs, and if you want to offer hash-like access it would be easy to lay down hash-like access to the struct members (iteration, etc.), and either implicitly or explicitly also offer an "overflow" hash table if desired. This would give at least a bit of locality control. Something something arrays too for array control, and suddenly you're starting cook with gas.
This sounds a lot like Object.freeze[0] in JS, at least for prototypes. I can't remember any of the major engines doing this, but it could be an interesting direction to explore.
[0]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
I think "unknown type at compile time" and "JIT-able constructs" are opposites. You'd end up with a tracing JIT that compiles the same set of code in different, specialized ways based on runtime determination of the the input types as though the lang was static. Which is basically what we have today. You can have optionally typed languages, but you still have to write your runtime and compiler to the lowest common denominator, so you probably want to go more strict if you care about your (runtime or compile-time) compiler and runtime performance (as the Dart devs learned IIRC).
I would assume "JIT-ability" is proportional to the immutable input and output knowledge of the code to be compiled.
Someone did a study of their Python servers, and found that after an initial phase, almost no dynamic facilities were used at all!
it seems to me that it would also be feasible to go the other way, and allow users to define things as structs, and if you want to offer hash-like access it would be easy to lay down hash-like access to the struct members
I like that. Smalltalk kind of did this, but much less. Objects were just arrays of object references, but certain object references were adorned smallint values. People also wrote hash-like accessor implementations for objects. "Shape changes" to objects of a certain class would require a stop the world re-allocation of all such objects, however.
The Ruby community went in different direction here [2] with RTL MJIT.
[1] https://github.com/eclipse/omr [2] https://medium.com/square-corner-blog/rubys-new-jit-91a5c864...
Memory non-locality. On modern systems, thrashing through RAM just kills you on performance. I don't know that you have to hand full control over to the programmer, but if you're randomly flinging values on the heap everywhere (a.k.a. "hash tables"), you're gonna hit a wall long before you get to "C performance".
For JITs, any time the JIT has the ability to make an assumption, but then you break it. The obvious one is "this function always returns an int", which the JIT can nicely optimize, but if once you ever return a string, the JIT will have to do something to deoptimize that case again, possibly permanently. More subtly, things like "returning an object that always has these three keys that map to these three values of a certain type", which a JIT can optimize to a struct under the hood, until you change a type or add a field for a particular call. Just in general, any assumption the JIT can make for performance that you might break in a function. You should generally program your performance-sensitive JS as if it were statically typed, even if it isn't.
Indirection for the user's convenience; one of the reasons CPython is sooo slooooow is that there's a ton of indirection going on. For the code x.y(), you can change x by directly overriding "y" on the specific x you are using, or it can be set at the class level, or it can be set by any of the superclasses, or the dot operator may be a function that does arbitrary things up to and including arbitrary mutation of whatever it gets its hands on, etc., plus even without overridding getattr there's this whole "property" thing, plus it could be a __slot__, and I'm pretty sure I'm missing at least one thing here, and CPython is hopping through all these hoops for every such lookup. The fact that "x.y()" translates to dozens or hundreds of C lines of code is what Python so convenient, but at the same time, what makes it so slow, too. I understand the value of being able to override things; I question the need of dynamic scripting languages to provide so many places to override things.
I remember reading that Mike Pall said about pypy that most of the optimizations he did for luajit were possible for python, but just a lot more work.
Using mutation is a really good way to make most scheme implementations run slowly, since mutation is generally avoided, and thus most flow analysis is spent on checking recursion. A while-loop using set! is generally a lot slower than a recursive named let loop.
Flow analysis is simply easier without mutation, even though it might not look easier for the person reading the code.
I am most familiar with LuaJIT. It has wiki page with a list of things that force it to fall back to the interpreter
AOT master race
Nothing wrong with having ultra-dynamic languages, but you will inevitably trade speed for quirky code in cases like that.
This is an easy decision.
If you're designing a new language, don't make everything (even arrays!) a hashmap.
http://terralang.org/api.html#embedding-new-languages-inside...
However, if you're writing your own JIT language and you've never written one before, you probably want to learn something and write your own GC, JIT, etc. It's really not that hard. Then once you get an idea of how everything works you can use OMR and make it fast.
This is a new area for me so apologies if I'm just misunderstanding the mechanics here!
JIT-ness is a concept related to the implementation, not to the language.
Used as "a JIT," the term is given new meaning (to refer to a compiler implicitly). And used as "JIT language," its giving the term a new meating (to refer to the fact that the language designed to be JITted).
Does anyone know the current state of play?
I've wanted fast multiple dispatch in js for many years. V8's handling of inlining budgets improved, and guards seemed off mainline, so I'd looked forward to trying again to get dispatch trees inlined away. Then Spectre hit. :/
Yes, in general. But I was basically hoping to craft a javascript multiple dispatch implementation that mostly landed on V8's existing dispatch fast path.
A polymorphic inline cache for a call site, starts with a check(s) that the object is an expected type. And a bit of inlined code, may start with checks that the inlining assumptions (types of arguments, etc) are still valid. In the JITed assembly, it looks like a bunch of test and jump instructions that come before the real work. But the processor's branch prediction, ideally recognizes that the common case is a still-valid cache/inlining. So the processor's speculative execution, ideally proceeds immediately with the real work. The checks happen off to the side - they don't delay the real work.
One way to do dynamic multiple dispatch is a dispatch tree of similar checks. As with regular dispatch, most multiple dispatch call sites are monomorphic. So if the dispatch checks gets inlined, you would have an extended version of the usual checks. And V8 changed how it allocates it's inlining budget, in a way that made arranging for inlining of dispatch seem more plausible than it once was. My hope was the dispatch checks would also happen off mainline. So multiple dispatch might be as fast as single dispatch.
But then speculative execution implementation flaws were recognized as a security threat, and jits began intentionally disabling or constraining speculative execution. Which might be sufficient to invalidate the whole idea. Since those mitigations seem an ongoing work in progress, and in the past I've found it expensive to tool up for v8 jit code analysis, I thought I'd see if anyone already had a feel for where we're at/headed.
On the first page of the documentation for libjit is this:
“Only a very small proportion of the work is concerned with language specifics.”
The thing is that Perl 6 tends to break more of the established rules than it follows.For example, most normal operators in Perl 6 are just subroutines with a special name:
a + b == &infix:«+»(a,b)
put &infix:«+».candidates.elems; # 26
There is currently 26 multi subs with the same name for handling the numeric infix addition operator.
(27 if you count the proto sub)That is among the easiest of things to optimize in Perl 6.
I would also like to know if it is a JIT that profiles and compiles the optimized code on another thread like MoarVM does. Is there a plugin system that allows high level code to influence what code gets JIT compiled like MoarVM recently received?
I'm sure that for just about every other dynamic language libjit would be very suitable. It's just that I imagine it would quickly strain under the weight that is Perl 6.
(Perl 6 brings in many varied features from many languages, and combines them in a way that it feel like they have always belonged together.)
At some point I transformed into a bitter sad man who can't stand reading articles like this because I feel like such a worthless piece of crap.
This can be your muse, channel it into something!