The program is randomly generated and I am guessing that the seed for this is deterministically determined from the current block head (or something similar) making it hard to attack.
It might lead to scenarios where a miner may optimise block generation itself, I guess?
I was more curious about the possibility of generating optimised branchless variants and then running them in parallel on multiple ASICs to ensure you cover every branch and submit all the results and hope you’re fast? Would that be more inefficient than relying on branch prediction and CPUs?