The HipHop Virtual Machine (opens in new tab)

(research.facebook.com)

78 pointskmavm11y ago29 comments

29 comments

29 comments · 9 top-level

beagle311y ago· 7 in thread

So, a couple of questions:

a. Does the HHVM JIT do anything that LuaJIT doesn't? I assume you are familiar with LuaJIT, as it is mentioned in the paper - and from a quick scan, the only two things I didn't recognize from LuaJIT were the refcount optimizations (not required by Lua GC) and guard relaxation.

b. Is HHIR tied to Php, or is it usable as a general purpose JIT backend? LuaJIT's IR is, (unfortunately for other languages) tied very strongly to Lua semantics.

Thanks for an interesting read!

nly11y ago

It's a shame. As far as I can tell, the only proven multi-language JIT for FOSS platforms is still the JVM? (and Mono maybe?)

Despite a lot of that's gone in to Parrot, PyPy, v8, etc, none of these VMs seem to have really taken off beyond the language they were intended for. Somewhat more sad is that pretty much all de-facto default runtimes for popular dynamic languages (except Javascript) are still interpreters... the cost of being portable.

beagle311y ago

I wouldn't say the JVM is a proven multi-language JIT, FOSS or not.

It works for multiple languages that go the extra mile - e.g. Jython pays very dearly in performance because of the object model mismatch between Python and Java. And I haven't looked closely recently, but when Clojure first came out, the impedance mismatch between Clojure's persistent data structures and Java's mutable ones also had a ridiculous performance cost.

Mono is perhaps more deserving of that title, especially together with IKVM, (so, everything JVM), F#, IronPython and Boo. But note that everything is still shackled to the underlying object model.

But if anything, LLVM is the only proven multi-language JIT for FOSS platforms; it's method-at-a-time, which is great for staticly typed languages (whether those types are declared upfront or inferred), not so much for extremely dynamic languages.

papercrane11y ago

Scala is pretty widely used. JRuby seems to do quite well as well.

Future times should be interesting as well. I'd be interested to know if the work on value types in the JVM would be useful for Clojure.

amelius11y ago

The JVM is multi-language only because the languages were specifically designed and/or modified to run on it.

lambdapower11y ago

What about Guile?

munificent11y ago

Does anyone use Guile for anything other than Scheme? I thought all of the other languages were pretty much just half-baked proofs of concept (like Parrot, for that matter).

kodablah11y ago

As a POC I wrote a rudimentary PHP-to-Lua converter to test this [1] (very limited in features). It performed very well.

1 - https://github.com/cretz/meh

zaptheimpaler11y ago· 4 in thread

This is awesome! I am not familiar with how compilers/VMs are generally implemented, but the tracelets and guards idea strikes me as very general - in particular, it looks like this approach could be applied to provide type inference to any dynamically typed language. Has this kind of thing been tried before?

In terms of optimization, is it possible to create tracelets that are not continuous regions in code? Roughly, if you identify two non contiguous tracelets, having the same inputs, and can guarantee the inputs havent changed in between, then you could merge them together. Because bigger tracelets would mean less guards and better performance.

kmavmOP11y ago

As far as I know, you could try tracelets with any dynamic language.

And yes, you could try to share tracelets whose bodies are identical, but, unless their successor tracelets are also identical, you'd need to "dynamicise" the dispatch so you go down Path 1 when you're really tracelet 1 and Path 2 when you're really tracelet 2. We normally chain tracelets together by either falling through (if it's unconditional and the successor tracelet could be placed right next to it) or with jmp or branch instructions.

rayiner11y ago

EDIT: That's what I get for not reading the paper first.

apaprocki11y ago

The tracing JIT was removed from Spidermonkey because it was brittle and didn't perform as well as the traditional JIT compiler(s) (Jaeger..., Ion...) that replaced it.

fijal11y ago

note that, confusingly enough, "tracelets" have nothing to do with "tracing JITs".

q_no11y ago· 4 in thread

I'm using HHVM in production for a few weeks now and all I can say is I'm very happy with the results. It's running stable (compiled under CentOS7) and I was able to cut the response times in half (~145ms with php-fpm, ~70ms HHVM).

reeze_xia11y ago

What is the version of your PHP?

q_no11y ago

reeze_xia11y ago

Great! With opcache enabled right?

joeguilmette11y ago

Just the other day I saw a demo of (unaltered) WordPress running with HHVM and PHP7(!)

__Joker11y ago· 2 in thread

I assume latent type means simply the implicit typing. i.e. you declare a variable with the value, and the type is inferred through the value of the variable rather than the explicit type declared.

kmavmOP11y ago

Hi, I'm one of the paper's co-authors.

As usual, the abstract really isn't enough to draw big conclusions from. The concept of a latent type applies to arbitrary expressions, not just variables. All the values flowing through a PHP program implicitly share the same union type; they can be floats, strings, arrays, etc. The latent types are the narrower types that can actually flow through the program in practice.

To be clear, it is more general than just things like:

  $a = 0; // $a is an int!

It includes learning that:

  g(foo() . bar());

foo() and bar() return strings. Since none of this information is marked syntactically in PHP, and since it might actually be undecidable because of dynamic control flow in the callees, dynamic binding, etc., you really need to see the program run to do this stuff.

amelius11y ago

I'm wondering: how does this compare speed-wise to other real-world JITs. For example, node.js (V8). Are there any tricks used in your VM which are not used by V8, and conversely?

Shish2k11y ago· 2 in thread

I've been trying this on my few remaining PHP based sites and finding it great. Can we now get the same for Python? ;-)

jhgg11y ago

Have you taken a look at Pypy? It's even mentioned in the paper! http://pypy.org/

fijal11y ago

The same idea has been done for psyco (tracelets there are called basic-block-at-a-time compilation). PyPy uses a tracing JIT instead to achieve better performance in cases where psyco was too hard.

joeguilmette11y ago· 1 in thread

I'm migrating to HHVM across my whole stack. Massive performance gains, only issues have been with some plugins that I was able to pretty easily work around.

Good stuff! Uncached, HHVM outperforms my cached php-fpm sites.

wldlyinaccurate11y ago

I've been doing the same and so far have seen big performance gains as well. Using HHVM also has the benefit of being able to "tack on" any Hack code in the future if (for example) you have some important code which could benefit from static typing.

rurban11y ago

Some comments from the maintainer of parrot, p2 and perl B::CC which do similar things:

tracelets instead of basic blocks analysis sounds interesting, but php is still doomed by not allowing optional types. In-house code can easily be optimized by explicit types. The AUTOLOAD problem is a big one, and I am just planning to tackle it, but came to the same design decisions mostly. We are compiling modules, files as this is easiest to handle. My p2 jit has no type guards and seperate specialized methods yet, I rather support optional early binding, a jitted method cache and small tagged data, which doesn't fill up the cache that much. It outperforms java and clr by far, just luajit is ahead.

With the static B::CC, type inference has the same problem as php, but has the same performance advantages as hhpc, but I added special syntax for typed and sized arrays, and to disallow too much runtime magic. The current production compiler at Cpanel only uses better data layout to get its performance boost at startup and overall memory usage. Readonly strings and hash keys mostly. Perfect hashes not yet. IMHO most important is smaller data and ops overhead, not the optimizer.

fijal11y ago

Question to the authors: Any reason why hippyvm is not included for comparison in the paper? It does usually outperform hhvm on those benchmarks (but not on real world use cases which is maybe a good reason to include real world use cases more into such papers).

thejosh11y ago

We're trialing HHVM, so far the results are between 2x and 20x faster.

Great leaps have been made over the last year and now is very stable.

j / k navigate · click thread line to collapse

29 comments

29 comments · 9 top-level

beagle311y ago· 7 in thread

So, a couple of questions:

b. Is HHIR tied to Php, or is it usable as a general purpose JIT backend? LuaJIT's IR is, (unfortunately for other languages) tied very strongly to Lua semantics.

Thanks for an interesting read!

nly11y ago

It's a shame. As far as I can tell, the only proven multi-language JIT for FOSS platforms is still the JVM? (and Mono maybe?)

beagle311y ago

I wouldn't say the JVM is a proven multi-language JIT, FOSS or not.

Mono is perhaps more deserving of that title, especially together with IKVM, (so, everything JVM), F#, IronPython and Boo. But note that everything is still shackled to the underlying object model.

papercrane11y ago

Scala is pretty widely used. JRuby seems to do quite well as well.

Future times should be interesting as well. I'd be interested to know if the work on value types in the JVM would be useful for Clojure.

amelius11y ago

The JVM is multi-language only because the languages were specifically designed and/or modified to run on it.

lambdapower11y ago

What about Guile?

munificent11y ago

Does anyone use Guile for anything other than Scheme? I thought all of the other languages were pretty much just half-baked proofs of concept (like Parrot, for that matter).

kodablah11y ago

As a POC I wrote a rudimentary PHP-to-Lua converter to test this [1] (very limited in features). It performed very well.

1 - https://github.com/cretz/meh

zaptheimpaler11y ago· 4 in thread

kmavmOP11y ago

As far as I know, you could try tracelets with any dynamic language.

rayiner11y ago

EDIT: That's what I get for not reading the paper first.

apaprocki11y ago

The tracing JIT was removed from Spidermonkey because it was brittle and didn't perform as well as the traditional JIT compiler(s) (Jaeger..., Ion...) that replaced it.

fijal11y ago

note that, confusingly enough, "tracelets" have nothing to do with "tracing JITs".

q_no11y ago· 4 in thread

reeze_xia11y ago

What is the version of your PHP?

q_no11y ago

reeze_xia11y ago

Great! With opcache enabled right?

joeguilmette11y ago

Just the other day I saw a demo of (unaltered) WordPress running with HHVM and PHP7(!)

__Joker11y ago· 2 in thread

I assume latent type means simply the implicit typing. i.e. you declare a variable with the value, and the type is inferred through the value of the variable rather than the explicit type declared.

kmavmOP11y ago

Hi, I'm one of the paper's co-authors.

To be clear, it is more general than just things like:

  $a = 0; // $a is an int!

It includes learning that:

  g(foo() . bar());

amelius11y ago

I'm wondering: how does this compare speed-wise to other real-world JITs. For example, node.js (V8). Are there any tricks used in your VM which are not used by V8, and conversely?

Shish2k11y ago· 2 in thread

I've been trying this on my few remaining PHP based sites and finding it great. Can we now get the same for Python? ;-)

jhgg11y ago

Have you taken a look at Pypy? It's even mentioned in the paper! http://pypy.org/

fijal11y ago

The same idea has been done for psyco (tracelets there are called basic-block-at-a-time compilation). PyPy uses a tracing JIT instead to achieve better performance in cases where psyco was too hard.

joeguilmette11y ago· 1 in thread

I'm migrating to HHVM across my whole stack. Massive performance gains, only issues have been with some plugins that I was able to pretty easily work around.

Good stuff! Uncached, HHVM outperforms my cached php-fpm sites.

wldlyinaccurate11y ago

rurban11y ago

Some comments from the maintainer of parrot, p2 and perl B::CC which do similar things:

fijal11y ago

thejosh11y ago

We're trialing HHVM, so far the results are between 2x and 20x faster.

Great leaps have been made over the last year and now is very stable.

j / k navigate · click thread line to collapse