I don't think jitting necessarily precludes counting runtime instructions. You could always jit in an internal variable that gets incremented for each high level instruction being translated. And of course you can optimize the increments within each basic block. There's even a cool minimum spanning tree algorithm due to Larus and Ball originally for path profiling that might be adaptable to reduce increments across the blocks.
I know this is a bit of an aside. The point still stands about the user not wanting their bpf program to terminate at runtime investment.