undefined | Better HN

0 pointsigouy10y ago0 comments

>> … almost 6x faster … over 50x faster … We think this shows how can be optimise more effectively on more realistic Ruby code than synthetic benchmarks.<<

Might the difference simply be test coverage?

-- The other Ruby implementations have been testing performance on those same synthetic benchmarks, and have already taken the opportunity to improve performance for those cases.

-- The other Ruby implementations have not been testing performance in other cases, and still have considerable opportunity to improve performance for those cases.

0 comments

3 comments · 1 top-level

chrisseaton10y ago· 2 in thread

I think the difference is that the synthetic benchmarks are generally written in a way that is as tight as possible, avoids allocation and abstraction, and they certainly don't use metaprogramming. That stuff is easier for everyone to optimise.

Real Ruby code uses a lot of abstraction, allocates objects constantly, and uses metaprogramming. Optimising these aspects of Ruby is much more complex and doing it well requires some optimisations such as partial escape analysis and powerful allocation removal that we have and JRuby and Rubinius do not.

My favourite example is this code from PSD.rb that implements a clamp routine. It does it by creating an array, sorting and finding the middle value. You wouldn't normally find code like this in a synthetic benchmark, but you would in real code.

    def clamp(value, min, max)
      [value, min, max].sort[1]
    end

In JRuby and Rubinius that code really will allocate an array, sort it using some library routine, and then index it. In JRuby+Truffle we compile that method to effectively:

    def clamp(value, min, max)
      (value > max) ? max : ((value < min) ? min : value);
    end

There's a massive massive difference between those two. One allocates objects on the heap, passes them into the runtime, runs a general purpose sort routine etc etc etc, thousands of machine instructions, and the other is a just couple of assembly instructions.

When you run this code as a benchmark, we're over 300x faster than Rubinius' LLVM-based JIT.

Of course we still support if someone has redefined Array#sort or something like that, and you could still find that Array instance using ObjectSpace if you wanted to, using deoptimisation.

igouyOP10y ago

>>… that we have and JRuby and Rubinius do not.<<

Do JRuby and Rubinius even have performance tests that cover those aspects of Ruby?

(I don't track Ruby implementation, I don't know the answer.)

chrisseaton10y ago

No. JRuby and Rubinius both have benchmark suites, but I believe they don't go as far as kernels from real gems, and neither of them track benchmarks in any kind of continuous integration system, which is why I developed Bench 9000 as part of my PhD.

But if they were to benchmark and see that things like that pack method were slow, I think it is unlikely they would be able to implement the algorithms needed to improve on this kind of code, given their current implementation techniques.

Rubinius is essentially a template compiler, emitting a chunk of LLVM for each byte code. There isn't any sophisticated optimisation before it goes into LLVM, so nothing to for example partially evaluate a sort routine or remove allocations. The LLVM that comes out is far too complex for LLVM's optimisations to work for them.

JRuby relies on the JVM to do the sophisticated optimisations, and C2 (the server compiler) just doesn't have the optimisations or inlining scope needed to simplify code like the pack example. JRuby are massively improving on this with their IR, but they are going to have reimplement some very complex optimisations themselves to make this work on methods like pack.

1 more reply

j / k navigate · click thread line to collapse