so you're comparing running times measured on a shared host? This is generally not considered a best practice if you want meaningful numbers.
1: http://www.techempower.com/benchmarks/#section=data-r9&hw=ec...
(For reference, I have a team that does nothing but maintain a performance benchmarking harness and test lab, so i know these things pretty cold :P)
I also don't see that the benchmarks were even run multiple times. I tried to follow the ruby-benchmark-suite all the way down the rabbit hole, but don't see anywhere it runs benchmarks multiple times, let alone gives you the variability, etc.
If that isn't done, then, well, it's one louder.
If you have N files, then that's only 3^N builds to test out each possible combination of O, O2, and O3. That shouldn't take too long. /s
The core FBSD 10 system is compiled with clang. Since Ruby uses system libraries, the question is if a clang vs. gcc compiled Ruby runtime would produce different results in a clang-compiled environment vs gcc-compiled system.
Hard to know how it would matter, but it does seem conceivable that it might.
If you want to use as little code produced by different compiler as possible, you can just link libs you need statically, compiling them by whatever compiler you want (of course it's not impossible that you'll run into obscure compiler-specific errors with that, but whatever, it's doable if you want it that much).
Better methods are readily available, for example Schulze's (https://en.wikipedia.org/wiki/Schulze_method). I wonder how much these rankings would change...
T: C D E F
U: D E F C
Looking at all four, D is better than C. If you hadn't looked at E and F, the two would tie.