Any authentic training data from pre-LLM's is assumed to have been used in training already and synthetic or generated data gives worse performing models, so the path of increasing its training data seems to be a dead end as well?
What is the next vector of training? Maybe data curation? Remove the low quality entries and accept a smaller, but more accurate data set?
I think the AI companies are starting to sweat a little, considering the promises they have made, their inability to deliver and turn a profit at its current state and the slowing improvements.
Interesting times! We are either all out of jobs or a massive market crash is imminent, awesome...