https://debugger.medium.com/why-is-apples-m1-chip-so-fast-32...
..a big part of the reason the M1 is so fast is the large reorder buffer, which is enabled by the fact that arm instructions are all the same size, which makes parallel instruction decoding far easier. Because x86 instructions are variable length, the processor has to do some amount of work to even find out where the next instruction starts, and I can see how it would be difficult to do that work in parallel, especially compared to an architecture with a fixed instruction size.