AR doesn't have to have those same requirements because you're not going to get nauseous looking at a buggy label flickering in the distance. You don't have to spend compute cycles to trick your brain into believing what you're seeing is real, because it is actually real (minus the overlayed 3D models, of course).
Now, sure, you will have to spend compute cycles on depth sensing and mapping your immediate area. But that's something the OS should/will do, not every app simultaneously. If you think about what an AR operating system would be responsible for doing, mapping your surroundings and providing that data via API is probably one of the first features it would have. It's no different from Windows or macOS communicating with your monitor so that your applications can draw on it. Similarly, every app likely won't be responsible for drawing its UX onto the user's vision - it will probably submit some graphic or model to the OS, which will then anchor that model "onto" reality and handle the user moving their head around it. Much like how every Windows app is not responsible for resizing or moving its window, that's the window manager's job to do. In AR we would probably have a reality manager or something.
What I will agree with you on is that we probably need desktop-level rendering power to solve AR, not mobile-level. However, with the release of M1, it does seem like we're pretty much there already, and I would not bet against Apple's chip team failing to make a smaller M1 that fits into AR glasses.