a. Does the HHVM JIT do anything that LuaJIT doesn't? I assume you are familiar with LuaJIT, as it is mentioned in the paper - and from a quick scan, the only two things I didn't recognize from LuaJIT were the refcount optimizations (not required by Lua GC) and guard relaxation.
b. Is HHIR tied to Php, or is it usable as a general purpose JIT backend? LuaJIT's IR is, (unfortunately for other languages) tied very strongly to Lua semantics.
Thanks for an interesting read!
Despite a lot of that's gone in to Parrot, PyPy, v8, etc, none of these VMs seem to have really taken off beyond the language they were intended for. Somewhat more sad is that pretty much all de-facto default runtimes for popular dynamic languages (except Javascript) are still interpreters... the cost of being portable.
It works for multiple languages that go the extra mile - e.g. Jython pays very dearly in performance because of the object model mismatch between Python and Java. And I haven't looked closely recently, but when Clojure first came out, the impedance mismatch between Clojure's persistent data structures and Java's mutable ones also had a ridiculous performance cost.
Mono is perhaps more deserving of that title, especially together with IKVM, (so, everything JVM), F#, IronPython and Boo. But note that everything is still shackled to the underlying object model.
But if anything, LLVM is the only proven multi-language JIT for FOSS platforms; it's method-at-a-time, which is great for staticly typed languages (whether those types are declared upfront or inferred), not so much for extremely dynamic languages.
Future times should be interesting as well. I'd be interested to know if the work on value types in the JVM would be useful for Clojure.
In terms of optimization, is it possible to create tracelets that are not continuous regions in code? Roughly, if you identify two non contiguous tracelets, having the same inputs, and can guarantee the inputs havent changed in between, then you could merge them together. Because bigger tracelets would mean less guards and better performance.
And yes, you could try to share tracelets whose bodies are identical, but, unless their successor tracelets are also identical, you'd need to "dynamicise" the dispatch so you go down Path 1 when you're really tracelet 1 and Path 2 when you're really tracelet 2. We normally chain tracelets together by either falling through (if it's unconditional and the successor tracelet could be placed right next to it) or with jmp or branch instructions.
As usual, the abstract really isn't enough to draw big conclusions from. The concept of a latent type applies to arbitrary expressions, not just variables. All the values flowing through a PHP program implicitly share the same union type; they can be floats, strings, arrays, etc. The latent types are the narrower types that can actually flow through the program in practice.
To be clear, it is more general than just things like:
$a = 0; // $a is an int!
It includes learning that: g(foo() . bar());
foo() and bar() return strings. Since none of this information is marked syntactically in PHP, and since it might actually be undecidable because of dynamic control flow in the callees, dynamic binding, etc., you really need to see the program run to do this stuff.Good stuff! Uncached, HHVM outperforms my cached php-fpm sites.
tracelets instead of basic blocks analysis sounds interesting, but php is still doomed by not allowing optional types. In-house code can easily be optimized by explicit types. The AUTOLOAD problem is a big one, and I am just planning to tackle it, but came to the same design decisions mostly. We are compiling modules, files as this is easiest to handle. My p2 jit has no type guards and seperate specialized methods yet, I rather support optional early binding, a jitted method cache and small tagged data, which doesn't fill up the cache that much. It outperforms java and clr by far, just luajit is ahead.
With the static B::CC, type inference has the same problem as php, but has the same performance advantages as hhpc, but I added special syntax for typed and sized arrays, and to disallow too much runtime magic. The current production compiler at Cpanel only uses better data layout to get its performance boost at startup and overall memory usage. Readonly strings and hash keys mostly. Perfect hashes not yet. IMHO most important is smaller data and ops overhead, not the optimizer.
Great leaps have been made over the last year and now is very stable.