The trouble I have thinking past the smartphone form-factor is what people will be doing with some AR thing. If its projecting something on some glass in front of my eyes, what am I doing? scrolling social media on it? The current state of voice response is poor for most things vs. tapping a screen/keyboard. Social media is probably where most users spend their time these days, which isn't likely to change, and also means they are posting things. Can you imagine trying to dictate posts with clever puns and witty wordings? Will everything be like tik-tok videos then?
I don't know - I have a hard time imagining the transition from current smart phone form-factor and typical teenager use cases to anything with AR/VR/*R.