You mean PL design, don't you? Performance work is very empirical. If my new type of inline cache is better then I need to prove that and it's falsifiable (using benchmarks, which I admit aren't ideal).
> Of the 133 papers published in the surveyed conference pro- ceedings, 88 had at least one section dedicated to experimental methodology and evaluation
Still, the benchmarks are usually toys and it's pretty easy to find contradictory benchmarking reports. Very few projects do online benchmarking comparisons of different design choices on real workloads