This is what I do not understand as well and advertising the knowledge and more advanced model is also the only thing that comes to my mind.
Since a month I am using gemma4 locally successfully on a MBP M2 for many search queries (wikipedia style questions) and it is really good, fast enough (30-40t/s) and feels nice as it keeps these queries private. But I don't understand why Google does this and so I think "we" need to find a better solution where the entire pipeline is open and the compute somehow crowdfunded. Because there will be a time when these local models will get more closed like Android is closing down. One restriction they might enforce in the future could be that they cripple the models down for "sensitive" topics like cybersecurity or health topics. Or the government could even feel the need to force them to do so.