No, well known in the current multiverse branch where we still occasionally use things like math and scientific analysis instead of people’s vibe checks and pelican SVGs.
Here’s the paper from OpenAI where Dario himself was a co-author: https://arxiv.org/pdf/2001.08361
> We have observed consistent scalings of language model log-likelihood loss with non-embedding parameter
count N, dataset size D, and optimized training computation Cmin, as encapsulated in Equations (1.5) and
(1.6). Conversely, we find very weak dependence on many architectural and optimization hyperparameters.
Since scalings with N,D,Cmin are power-laws, there are diminishing returns with increasing scale.