No, that's what you're missing from the beginning: the breakthrough of transformers
was scalability. Now we have other models that
are equally scalable and as such roughly equally performant (and that's not a surprise).
But the ship has sailed and nobody is gonna switch to something else than transformers if it's not significantly better, and as such the other approaches are going to stay behind because every marginal improvement come to transformers first (because that's what practically everyone is working on) and alternative models are playing catch-up.
This is a remarkable example of path dependence.
Interpreting this as “transformers are fundamentally superior” is the mistake I'm trying to help you correct.
The breakthrough of transformers was scalability. The next breakthrough of equivalent importance will be entirely different or it won't be.