A lot of it is that it's hard to design a modern application CPU. Unlike the others Apple was able to put it all together and had the advantage of tuning for their own software (i.e. the NSObject optimization)
Graviton2 uses an ARM-designed core and actually cut the cache sizes a bit from ARM's recommendation, and it's certainly competitive if not an absolute 'wow!' like M1 is. (Apple could have shipped at that level 2-3 years ago.)