The differentiating factor will be access to proprietary training data. Everyone can scrape the public web and use that to train an LLM. The frontier companies are spending a fortune to buy exclusive licenses to private data sources, and even hiring expert humans specifically to create new training data on priority topics.