It's not a matter of "catching on". There are smaller models and they have been looking into putting useful models on device from day one.
It's just the fact that the models that can currently run on a mobile device are not effective enough. It is about the most memory and compute intensive type of application ever. In particular, models that can reason or follow instructions reliably and for general purpose are too big to run quickly on mobile devices.
They are putting models on phones, but they do not have general purpose assistant capabilities.
This may change in the next few years though.