The proper solution is to not have unnecessary dynamics. There is no need for the ability to redefine methods on an object. If you have functions as first-class objects, you can just let users store functions and call them at the cost of indirection if they wish to have such dynamics.
I've been thinking of making a scripting language where you can freely redefine things until you return the main function to the script's caller. At that point main gets compiled / JITed / otherwise locked and any redefinitions done by calling it are errors.
Sounds awesome! We could call the phases before that point "compile time" and "preprocessor time", and the phase after that point "run time".
/sarc
In prototype-based languages like JavaScript and Lua redefining methods on an object is common at runtime, which makes them quite challenging to JIT. It may be most common early in execution, but there's no way to semantically define that boundary.
Since Lua 5.2 (IIRC) certain metamethods are locked & loaded when you assign a metatable (a prototype definition) to an object. Thus, the __gc finalizer is only obeyed if it existed when the metatable was set on the object. (I don't think this was a performance improvement--more about tradeoffs in GC complexity--but it's an example of side-effect semantics that could be leveraged for JIT optimization.) But this isn't the case for OOP-like methods (which in Lua are normally defined indirectly through the __index metamethod field), as those are ad hoc and would be expected to change during runtime.
A similar issue occurs in any dynamic language (JavaScript, Lua, Perl, etc) that uses generic dictionaries for assigning and loading functions. Ideally a JIT would know that invoking a function like "a.b.foo()" would always load the same function and could elide the dictionary lookup for "b" and "foo". But keeping track of whether the dictionary a or b was modified is costly. Theoretically you could, e.g., add callback hooks to dictionary entries that, when invoked, invalidate some JIT'd block; but such conditional checks and operations cause a huge amount of code bloat and slow down the fast path. Adding more logic in an attempt to minimize unnecessary work causes the same problems.
The lesson from languages like Forth, K, and Lua is that the most important thing to optimize isn't JITing, but the software VM itself, including the bytecode and dispatch tables. (Mike Pall of LuaJIT fame makes this point.) The deep pipelines and huge caches (e.g. for branch speculation) in modern processors means that the abstraction of a bytecode and dispatch table can often be subsumed into the pipeline, resulting in a fixed but relatively small overheard as compared to native code. This is especially true for the bulk of the application code, where you may only see a [hand waving] 1.5x or 2x overhead in a language like Lua or especially LuaJIT. And you can do even better by moving specific hot spots into C code. Languages like Python and Ruby don't have nearly as lean a VM as Lua so the overhead is greater and more variable, but the idea is similar.
If WebAssembly catches on, I think we'll begin to see some regressions in JavaScript JITs because the marginal cost and complexity won't be as worthwhile when people begin moving compute-heavy code into WebAssembly. Simplicity might even bring some performance improvements for code that was never susceptible to JITing because there'll be less baggage.
[1] Largely a result of the lean and clean semantics. Which is not the same as power--Lua has fully lexical scoping with proper closures, and asymmetric stackful coroutines. Asymmetric, stackful coroutines are exceedingly powerful abstractions, but also rid Lua of the colored functions[2] problem that languages that adopt explicit async/await semantics have, which means function invocation semantics are unified making the implementation simpler and leaner and in turn making it more likely VM dispatch is cleanly pipelined in hardware. Less is more. Which is a similar lesson Linux taught the world--Windows never had fork() as it was considered too heavyweight and complex for the common case, and instead focused on multiple different interfaces, one for creating threads and one for invoking new programs. But Linux optimized the heck out of fork, so even creating a thread is faster on Linux than on Windows, and the semantic power of fork makes it easier to implement complex resource sharing schemes between processes than on Windows (i.e. rather having an extremely complex data structure and flags for telling the OS what resources to pass or share between processes, you just use other common APIs--e.g. dup(), etc--before exec()).
[2] http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y...
It is common, but I do not think it is at all necessary. It always looks nasty whenever I see it in dynamic languages, and I never feel a need to do so in static languages. Whenever functionality is swapable, you'd instead have a function pointer that you call from a permanent method.
> The lesson from languages like Forth, K, and Lua is that the most important thing to optimize isn't JITing, but the software VM itself, including the bytecode and dispatch tables.
I'd very much question this. Properly optimized JIT output should be orders of magnitude faster than interpreted bytecode for even the best interpreter, and if enough is JIT'ed, the interpreter is no longer in play. Contradicting data would be interesting, although I do not generally concern myself with dynamic languages anymore unless I have to.
Say we had code like,
a + b
Statically compiled code might look like load a from stack to register
load b from stack to register
add a and b
A VM would look like load opcode from state to register
compute opcode address # nullop or two loads and index
jump to opcode block
load a from state to register
load b from state to register
add a and b
All of that is easily pipelined, especially by the very latest processors which speculate through indirect jumps (which is why we have Spectre, etc). The above is idealized but well reflects, I think, how modern register-based software VMs work.But when you have a JIT for a dynamically typed language, the entry and exit points of both interpreted sequences and JIT'd sequences require many more instructions to manage bookkeeping, exploding the cost. JIT'ing only works if you can compile blocks of code large enough that the benefits exceed the bookkeeping costs. But that's a tall order for dynamically typed languages where runtime mutations can invalidate JIT'd blocks at many points in a sequence, such as with prototype-based languages.
Getting "[p]roperly optimized JIT output" is the crux of the problem. It takes significant instrumentation and indirection to create and maintain "[p]roperly optimized JIT output". You can't compare the optimized machine code sequences to the analogous interpreted sequences, independent of the surrounding machinery.
Much of the performance benefit of statically compiled code isn't in execution, per se, but in the data structures. A language like Lua is constantly indexing hash tables[1] for even simple record objects, whereas in C you're usually doing direct memory references. But transforming hash table lookups in a dynamic language into direct memory references a la statically compiled C structs is extremely hard if not impossible. Engines like V8 manage to do it much of the time in the context of loading prototype methods, but for ad hoc runtime data structures I don't think it can optimize that at all.
But if your code is primarily operating on, e.g., JSON trees, it wouldn't matter one way or another. If your statically compiled code isn't benefiting from direct memory addressing of data (as is the case with many types of applications) then statically compiled, JIT'd, and interpreted code can have similar runtime profiles, and in many cases you can't even be sure which will be faster in real-world systems.
[1] Lua has opcodes for this so the cost is fixed and small relative to raw C code doing the lookup. And strings in Lua are interned so lookup is usually as simple as a mask and direct index into an array.
Having a good baseline (either a JIT or just a good interpreter) also means that you don't need to rely as much on the full JIT. This is important because it is actually hard to spend 100% of the time in the JIT -- poison pills abound.