I will simply say that Google is one of the few places that, to a first approximation, has this. We have our (pretty good) suite of tests. We can run only affected tests. We can them with instrumentation (*SAN), we can run them under various memory and CPU limits, we can run them 1000 times in parallel to deflake.
Anyway Google has all of that, and yet still finds this improvement.