Benchmarking Ruby with GCC and Clang (opens in new tab)

(p8952.info)

57 pointsp895211y ago17 comments

17 comments

16 comments · 9 top-level

lorenzhs11y ago· 2 in thread

> All tests were run on AWS from an m3.medium EC2 instance

so you're comparing running times measured on a shared host? This is generally not considered a best practice if you want meaningful numbers.

p8952OP11y ago

m3.medium is shared in the sense that it is a virtualized instance on top of a hypervisor, however unlike the t2 instance range it provides set resources not burstable ones. Perhaps not the best practice, but it's not an uncommon way of doing benchmarks. TechEmpower Framework Benchmarks[1] are run on AWS for instance.

1: http://www.techempower.com/benchmarks/#section=data-r9&hw=ec...

DannyBee11y ago

While not uncommon, the performance variation of most of these virtualized instances is so high that your numbers are probably within the range of acceptable error.

(For reference, I have a team that does nothing but maintain a performance benchmarking harness and test lab, so i know these things pretty cold :P)

I also don't see that the benchmarks were even run multiple times. I tried to follow the ruby-benchmark-suite all the way down the rabbit hole, but don't see anywhere it runs benchmarks multiple times, let alone gives you the variability, etc.

krisdol11y ago· 2 in thread

Unsurprised at the superior performance of GCC, but I am surprised that ruby ships with -O3. Why would they choose that optimization level?

mhd11y ago

Not automatic build system that tries different configurations for standard benkchmarks for each release?

If that isn't done, then, well, it's one louder.

desdiv11y ago

Even better, use different optimization levels per file and benchmark them.

If you have N files, then that's only 3^N builds to test out each possible combination of O, O2, and O3. That shouldn't take too long. /s

1 more reply

yxhuvud11y ago· 1 in thread

It would have been nice to see if the changes actually were noticeable or not. Yes, one version may be faster than another, but simply ordering a set of tests like that doesn't show if it is worth the trouble doing anything about it.

p8952OP11y ago

I have linked the data next to the graph, it's called "Raw Data". I'm unsure how best to represent it to show real differences though. The raw scores per test each on their own graph would be the most accurate way, but not very easy to read.

jrapdx311y ago· 1 in thread

One thing I'm wondering about is whether running the tests on FreeBSD 10.1 would make a difference.

The core FBSD 10 system is compiled with clang. Since Ruby uses system libraries, the question is if a clang vs. gcc compiled Ruby runtime would produce different results in a clang-compiled environment vs gcc-compiled system.

Hard to know how it would matter, but it does seem conceivable that it might.

krick11y ago

Benchmarking on the another OS would be much more than benchmarking compilers. Different OS is different OS, it manages memory differently, schedules processes differently. No way it would be a more "clean" benchmark.

If you want to use as little code produced by different compiler as possible, you can just link libs you need statically, compiling them by whatever compiler you want (of course it's not impossible that you'll run into obscure compiler-specific errors with that, but whatever, it's doable if you want it that much).

wbhart11y ago· 1 in thread

What compiler optimisation levels were used for Clang?

p8952OP11y ago

Sorry, I've updated that now. O2 was used for all Clang variants.

arthursilva11y ago

Similar tests (and results) for Postgres http://blog.pgaddict.com/posts/compiler-optimization-vs-post...

Sjlver11y ago

This ranking uses a method called Borda Count (https://en.wikipedia.org/wiki/Borda_count). It can lead to quite arbitrary results, for a number of reasons. One example is that being 9th is three times better than being 11th, whereas being first is only marginally better (relatively spoken) than being third.

Better methods are readily available, for example Schulze's (https://en.wikipedia.org/wiki/Schulze_method). I wonder how much these rankings would change...

masklinn11y ago

Would be interesting to see if Os improves performances on O2, especially for 4.9

Someone11y ago

Doing the comparison as a ranking is bad, as the result can change if you add or remove compilers. For example, with two tests T and U and four compilers C, D, E and F:

  T: C D E F
  U: D E F C

Looking at all four, D is better than C. If you hadn't looked at E and F, the two would tie.

j / k navigate · click thread line to collapse

17 comments

16 comments · 9 top-level

lorenzhs11y ago· 2 in thread

> All tests were run on AWS from an m3.medium EC2 instance

so you're comparing running times measured on a shared host? This is generally not considered a best practice if you want meaningful numbers.

p8952OP11y ago

1: http://www.techempower.com/benchmarks/#section=data-r9&hw=ec...

DannyBee11y ago

While not uncommon, the performance variation of most of these virtualized instances is so high that your numbers are probably within the range of acceptable error.

(For reference, I have a team that does nothing but maintain a performance benchmarking harness and test lab, so i know these things pretty cold :P)

krisdol11y ago· 2 in thread

Unsurprised at the superior performance of GCC, but I am surprised that ruby ships with -O3. Why would they choose that optimization level?

mhd11y ago

Not automatic build system that tries different configurations for standard benkchmarks for each release?

If that isn't done, then, well, it's one louder.

desdiv11y ago

Even better, use different optimization levels per file and benchmark them.

If you have N files, then that's only 3^N builds to test out each possible combination of O, O2, and O3. That shouldn't take too long. /s

1 more reply

yxhuvud11y ago· 1 in thread

p8952OP11y ago

jrapdx311y ago· 1 in thread

One thing I'm wondering about is whether running the tests on FreeBSD 10.1 would make a difference.

Hard to know how it would matter, but it does seem conceivable that it might.

krick11y ago

wbhart11y ago· 1 in thread

What compiler optimisation levels were used for Clang?

p8952OP11y ago

Sorry, I've updated that now. O2 was used for all Clang variants.

arthursilva11y ago

Similar tests (and results) for Postgres http://blog.pgaddict.com/posts/compiler-optimization-vs-post...

Sjlver11y ago

Better methods are readily available, for example Schulze's (https://en.wikipedia.org/wiki/Schulze_method). I wonder how much these rankings would change...

masklinn11y ago

Would be interesting to see if Os improves performances on O2, especially for 4.9

Someone11y ago

Doing the comparison as a ranking is bad, as the result can change if you add or remove compilers. For example, with two tests T and U and four compilers C, D, E and F:

  T: C D E F
  U: D E F C

Looking at all four, D is better than C. If you hadn't looked at E and F, the two would tie.

j / k navigate · click thread line to collapse