undefined | Better HN

0 pointsarghwhat8y ago0 comments

Then you either need to have "redefinition" checks, or an actual, full stop the world, which is also a performance nightmare and also affects unrelated code.

The proper solution is to not have unnecessary dynamics. There is no need for the ability to redefine methods on an object. If you have functions as first-class objects, you can just let users store functions and call them at the cost of indirection if they wish to have such dynamics.

0 comments

13 comments · 3 top-level

jtolmar8y ago· 5 in thread

How about a language that loses the ability to redefine structures and methods after a certain point?

I've been thinking of making a scripting language where you can freely redefine things until you return the main function to the script's caller. At that point main gets compiled / JITed / otherwise locked and any redefinitions done by calling it are errors.

zbentley8y ago

> How about a language that loses the ability to redefine structures and methods after a certain point?

Sounds awesome! We could call the phases before that point "compile time" and "preprocessor time", and the phase after that point "run time".

/sarc

yorwba8y ago

Most languages lack one of these phases. C/C++ and the C preprocessor are completely different languages, you can't execute C code during preprocessing. That's annoying and the reason for the introduction of constexpr in C++11. Other languages are too dynamic to be handled by a static compiler, but might benefit if you could transition to optimized machine code at runtime. Except for Lisp, I don't think there are many other languages where such a thing is possible.

white-flame8y ago

Or even just a manual bake() call, with an expensive unbake() to use during development (or maybe even deployed if it's that reconfigurable). But I do agree that a tool like this should affect the global environment and not try to be fine grained. Leave any fine-grained decisions up to the VM itself.

yorwba8y ago

That's how RPython works, essentially. https://rpython.readthedocs.io/en/latest/getting-started.htm...

staticassertion8y ago

Sounds like macros.

wahern8y ago· 5 in thread

> There is no need for the ability to redefine methods on an object

In prototype-based languages like JavaScript and Lua redefining methods on an object is common at runtime, which makes them quite challenging to JIT. It may be most common early in execution, but there's no way to semantically define that boundary.

Since Lua 5.2 (IIRC) certain metamethods are locked & loaded when you assign a metatable (a prototype definition) to an object. Thus, the __gc finalizer is only obeyed if it existed when the metatable was set on the object. (I don't think this was a performance improvement--more about tradeoffs in GC complexity--but it's an example of side-effect semantics that could be leveraged for JIT optimization.) But this isn't the case for OOP-like methods (which in Lua are normally defined indirectly through the __index metamethod field), as those are ad hoc and would be expected to change during runtime.

A similar issue occurs in any dynamic language (JavaScript, Lua, Perl, etc) that uses generic dictionaries for assigning and loading functions. Ideally a JIT would know that invoking a function like "a.b.foo()" would always load the same function and could elide the dictionary lookup for "b" and "foo". But keeping track of whether the dictionary a or b was modified is costly. Theoretically you could, e.g., add callback hooks to dictionary entries that, when invoked, invalidate some JIT'd block; but such conditional checks and operations cause a huge amount of code bloat and slow down the fast path. Adding more logic in an attempt to minimize unnecessary work causes the same problems.

The lesson from languages like Forth, K, and Lua is that the most important thing to optimize isn't JITing, but the software VM itself, including the bytecode and dispatch tables. (Mike Pall of LuaJIT fame makes this point.) The deep pipelines and huge caches (e.g. for branch speculation) in modern processors means that the abstraction of a bytecode and dispatch table can often be subsumed into the pipeline, resulting in a fixed but relatively small overheard as compared to native code. This is especially true for the bulk of the application code, where you may only see a [hand waving] 1.5x or 2x overhead in a language like Lua or especially LuaJIT. And you can do even better by moving specific hot spots into C code. Languages like Python and Ruby don't have nearly as lean a VM as Lua so the overhead is greater and more variable, but the idea is similar.

If WebAssembly catches on, I think we'll begin to see some regressions in JavaScript JITs because the marginal cost and complexity won't be as worthwhile when people begin moving compute-heavy code into WebAssembly. Simplicity might even bring some performance improvements for code that was never susceptible to JITing because there'll be less baggage.

[1] Largely a result of the lean and clean semantics. Which is not the same as power--Lua has fully lexical scoping with proper closures, and asymmetric stackful coroutines. Asymmetric, stackful coroutines are exceedingly powerful abstractions, but also rid Lua of the colored functions[2] problem that languages that adopt explicit async/await semantics have, which means function invocation semantics are unified making the implementation simpler and leaner and in turn making it more likely VM dispatch is cleanly pipelined in hardware. Less is more. Which is a similar lesson Linux taught the world--Windows never had fork() as it was considered too heavyweight and complex for the common case, and instead focused on multiple different interfaces, one for creating threads and one for invoking new programs. But Linux optimized the heck out of fork, so even creating a thread is faster on Linux than on Windows, and the semantic power of fork makes it easier to implement complex resource sharing schemes between processes than on Windows (i.e. rather having an extremely complex data structure and flags for telling the OS what resources to pass or share between processes, you just use other common APIs--e.g. dup(), etc--before exec()).

[2] http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y...

arghwhatOP8y ago

> In prototype-based languages like JavaScript and Lua redefining methods on an object is common at runtime, which makes them quite challenging to JIT. It may be most common early in execution, but there's no way to semantically define that boundary.

It is common, but I do not think it is at all necessary. It always looks nasty whenever I see it in dynamic languages, and I never feel a need to do so in static languages. Whenever functionality is swapable, you'd instead have a function pointer that you call from a permanent method.

> The lesson from languages like Forth, K, and Lua is that the most important thing to optimize isn't JITing, but the software VM itself, including the bytecode and dispatch tables.

I'd very much question this. Properly optimized JIT output should be orders of magnitude faster than interpreted bytecode for even the best interpreter, and if enough is JIT'ed, the interpreter is no longer in play. Contradicting data would be interesting, although I do not generally concern myself with dynamic languages anymore unless I have to.

wahern8y ago

> Properly optimized JIT output should be orders of magnitude faster than interpreted bytecode

Say we had code like,

  a + b

Statically compiled code might look like

  load a from stack to register
  load b from stack to register
  add a and b

A VM would look like

  load opcode from state to register
  compute opcode address # nullop or two loads and index
  jump to opcode block
  load a from state to register
  load b from state to register
  add a and b

All of that is easily pipelined, especially by the very latest processors which speculate through indirect jumps (which is why we have Spectre, etc). The above is idealized but well reflects, I think, how modern register-based software VMs work.

But when you have a JIT for a dynamically typed language, the entry and exit points of both interpreted sequences and JIT'd sequences require many more instructions to manage bookkeeping, exploding the cost. JIT'ing only works if you can compile blocks of code large enough that the benefits exceed the bookkeeping costs. But that's a tall order for dynamically typed languages where runtime mutations can invalidate JIT'd blocks at many points in a sequence, such as with prototype-based languages.

Getting "[p]roperly optimized JIT output" is the crux of the problem. It takes significant instrumentation and indirection to create and maintain "[p]roperly optimized JIT output". You can't compare the optimized machine code sequences to the analogous interpreted sequences, independent of the surrounding machinery.

Much of the performance benefit of statically compiled code isn't in execution, per se, but in the data structures. A language like Lua is constantly indexing hash tables[1] for even simple record objects, whereas in C you're usually doing direct memory references. But transforming hash table lookups in a dynamic language into direct memory references a la statically compiled C structs is extremely hard if not impossible. Engines like V8 manage to do it much of the time in the context of loading prototype methods, but for ad hoc runtime data structures I don't think it can optimize that at all.

But if your code is primarily operating on, e.g., JSON trees, it wouldn't matter one way or another. If your statically compiled code isn't benefiting from direct memory addressing of data (as is the case with many types of applications) then statically compiled, JIT'd, and interpreted code can have similar runtime profiles, and in many cases you can't even be sure which will be faster in real-world systems.

[1] Lua has opcodes for this so the cost is fixed and small relative to raw C code doing the lookup. And strings in Lua are interned so lookup is usually as simple as a mask and direct index into an array.

1 more reply

ufo8y ago

Many language implementations divide their JIT into a "baseline JIT" and a "full JIT". A fast interpreter can do the job of the baseline JIT, with faster startup speed and less latency.

Having a good baseline (either a JIT or just a good interpreter) also means that you don't need to rely as much on the full JIT. This is important because it is actually hard to spend 100% of the time in the JIT -- poison pills abound.

ehaliewicz28y ago

Array languages like APL/K/J can be tremendously fast for an interpreter, since each bytecode works on entire arrays at a time. Not quite as fast as say, C or C++ compiled with a modern compiler, but I think they can match other static languages. And of course, when you're branching on or processing single values rather than whole arrays/matrixes, you lose the speed benefit.

skybrian8y ago

I'm not sure fork is a good example. Languages that rely on threads or coroutines (like Java or Go) don't even support the traditional fork semantics. This is a global operation that plays havok with language-level invariants.

Retra8y ago

Generally speaking, the most efficient solution for a problem is the one that can't solve any problems more difficult than the given one.

j / k navigate · click thread line to collapse

0 comments

13 comments · 3 top-level

jtolmar8y ago· 5 in thread

How about a language that loses the ability to redefine structures and methods after a certain point?

zbentley8y ago

> How about a language that loses the ability to redefine structures and methods after a certain point?

Sounds awesome! We could call the phases before that point "compile time" and "preprocessor time", and the phase after that point "run time".

/sarc

yorwba8y ago

white-flame8y ago

yorwba8y ago

That's how RPython works, essentially. https://rpython.readthedocs.io/en/latest/getting-started.htm...

staticassertion8y ago

Sounds like macros.

wahern8y ago· 5 in thread

> There is no need for the ability to redefine methods on an object

[2] http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y...

arghwhatOP8y ago

> The lesson from languages like Forth, K, and Lua is that the most important thing to optimize isn't JITing, but the software VM itself, including the bytecode and dispatch tables.

wahern8y ago

> Properly optimized JIT output should be orders of magnitude faster than interpreted bytecode

Say we had code like,

  a + b

Statically compiled code might look like

  load a from stack to register
  load b from stack to register
  add a and b

A VM would look like

  load opcode from state to register
  compute opcode address # nullop or two loads and index
  jump to opcode block
  load a from state to register
  load b from state to register
  add a and b

1 more reply

ufo8y ago

Many language implementations divide their JIT into a "baseline JIT" and a "full JIT". A fast interpreter can do the job of the baseline JIT, with faster startup speed and less latency.

ehaliewicz28y ago

skybrian8y ago

Retra8y ago

Generally speaking, the most efficient solution for a problem is the one that can't solve any problems more difficult than the given one.

j / k navigate · click thread line to collapse