From my understanding this is now outdated. The deep double descent research showed that although past a certain point performance drops as you increase model size, if you keep increasing it there is another threshold where it paradoxically starts improving again. From that point onwards increasing the parameter count only further improves performance.