Most of the simple benches I do are for ~1 second. The order-dependent things definitely were reproducible (something along the lines of rerunning resulting in some rare virtual method case finally being invoked enough times/with enough cases to heavily penalize the vastly more frequent case). And the case of very different results C2 deciding to compile the code differently (looking at the assembly was problematic as adding printassembly whatever skewed the case it took), and stayed stable for tens of seconds after the first ~second iirc (though, granted, it was preview jdk.incubator.vector code).