To guarantee that, testing has to go through _all_ possible game states. For almost any non-trivial game, that's infeasible.
"and then restricting the game to only be playable on that iPhone or newer"
There is no guarantee that newer hardware would be faster for every possible program execution, and even if it were, timing differences could affect game play.
There also is no guarantee that newer hardware produces the exact same results. For example, better anti-aliasing or fonts drawn at double resolution could affect hit detection.
This even isn't guaranteed on the 'same' hardware. For example, there might be C64's that don't have the bugs that this demo exploits.