The database wars of the late 1990s were full of this kind of stuff. Oracle, Sybase, IBM etc invested heavily in tuning specifically for benchmarks like TPC-C just so they could post ads in the Wall St Journal saying theirs was faster.
I do sympathize with OP, though, their objection to measuring cold-start queries is incomplete without also describing how often cold start needs to happen. If you restart once every five years then it doesnt matter as much if it takes 20 minutes to be warm. Every hour, that would be a real problem.