I think the difference is that the synthetic benchmarks are generally written in a way that is as tight as possible, avoids allocation and abstraction, and they certainly don't use metaprogramming. That stuff is easier for everyone to optimise.
Real Ruby code uses a lot of abstraction, allocates objects constantly, and uses metaprogramming. Optimising these aspects of Ruby is much more complex and doing it well requires some optimisations such as partial escape analysis and powerful allocation removal that we have and JRuby and Rubinius do not.
My favourite example is this code from PSD.rb that implements a clamp routine. It does it by creating an array, sorting and finding the middle value. You wouldn't normally find code like this in a synthetic benchmark, but you would in real code.
def clamp(value, min, max)
[value, min, max].sort[1]
end
In JRuby and Rubinius that code really will allocate an array, sort it using some library routine, and then index it. In JRuby+Truffle we compile that method to effectively:
def clamp(value, min, max)
(value > max) ? max : ((value < min) ? min : value);
end
There's a massive massive difference between those two. One allocates objects on the heap, passes them into the runtime, runs a general purpose sort routine etc etc etc, thousands of machine instructions, and the other is a just couple of assembly instructions.
When you run this code as a benchmark, we're over 300x faster than Rubinius' LLVM-based JIT.
Of course we still support if someone has redefined Array#sort or something like that, and you could still find that Array instance using ObjectSpace if you wanted to, using deoptimisation.