I am wondering when the seminal moment occurred that signs of reasoning and generalization transpired when scaling the parameters and training data in transformer-based models. Presumably it happened at OpenAI, but that's just an assumption. It would be fascinating to learn more on the moment of discovery and if there are any technical write-ups or blog posts detailing the -- fair to say -- historical process and findings. Does anybody have insights or pointers to share on this topic?