If that's true, then one test where you have a single process spinning into and out of a single syscall will have very different performance characteristics than a test where you have more processes than processor cores, because context switches flush the TLB.
Somebody who knows actual things about x86 and so forth please tell me if I'm spouting 90s-era comp sci architecture textbook stuff that no longer applies.