There are definitely gains with the on-package memory, but that's only the tip of the spear. The tight integration of hardware and software is also an advantage -- look at how Apple Silicon is optimized for faster retain/release of NSObjects, with less memory overhead. This is a function performed by every piece of software on the system, but executed several orders of magnitude faster than on x64 because Apple can customize the hardware to fit their model of computing. Earlier discussion of this:
https://news.ycombinator.com/item?id=25203924The only other designer of ARM chips at this scale is Qualcomm, and they stick to creating very general purpose designs (chips that conform to known designs and can be decent for all of their customers.) The exception being some minor one-off tweaks for Microsoft's ARM laptop. Intel and AMD are in the same boat -- they can do new and innovative things in hardware, but it doesn't really mean anything unless software is optimized for it. And if software is never optimized for it, was it worth the engineering investment?
Of course Apple is also offloading a lot of stuff from the CPU cores into specialized on-die units for machine learning, video CODECs, etc. Along with very decent GPUs. No, it's not all strictly CPU stuff, but it does all matter in the end.
Not all of what Apple is doing is everyone's cup of tea, and no it's not "the best performance in the world", but it's hard to find better performance in the same power envelope.