EDIT: another benefit of a criterion-like approach is that you wouldn't require nightly
We chose for a test harness because one of our goals was to make it as easy as possible to run it on existing Rust projects. A lot of projects define tests, but benchmarks are not often not present. But maybe a bench harness would be a better and/or cleaner approach, will look into it!
I’ve always assumed this to be true, but I see a lot of benchmarking tools / libraries measuring wall-clock time or iterations-per-second or something like that, I’ve never seen a benchmark tool which counts CPU instructions. Am I being blind or is there some other reason that I’m not seeing them? :S
Instruction counting is more of a specialized tool but I like to use it whenever I can because it has low variance and makes comparing changes a lot easier. Compare how bumpy these graphs are for instruction count (first link) and wall clock time (second link):
https://perf.rust-lang.org/?start=&end=&kind=raw&stat=wall-t...
You also can't really count instructions in the cloud.
One robust solution is to instead do pairwise comparisons, many times in a round robin fashion. The results aren't quite as nice to plot, as you don't get a single consistent speed value, but they are much more reliable and true, and you still get useful information, like ">95% chance that this test is at least 20% faster at this commit than at the previous one".
A project I contribute to uses this strategy: https://github.com/Polymer/tachometer, but I'd love it if more benchmarks took this approach.
Any ideas on how to measure energy consumption of programs in a GNU/Linux OS? I know of `powertop` but it measures total power consumption (its per-program table is inaccurate)