Pareto
is about diminishing returns.
> but each succeeding iteration seems to be more disappointing
This is because the scaling hypothesis (more data and more compute = gains) is plateauing, because all text data is used and compute is reaching diminishing returns for some reason I’m not smart enough to say why, but it is.
So now we're seeing incremental core model advancements, variations and tuning in pre- and post training stages and a ton of applications (agents).
This is good imo. But obviously it’s not good for delusional valuations based exponential growth.