> I am pretty sure that all of the benchmarks showed 64-bit ARM losing to 32-bit, though, by a few percent.
I think that's true on an RPi with it's super gate constrained, in-order core, but AArch64 was really designed to make OoOE cores with complex prediction a lot easier.