I used to be in the camp of "GPT-2 / GPT-3 is a glorified Markov chain". But over the last few months, I flipped 180° - I think we may have accidentally cracked a core part of "generalized intelligence" problem. It's not about the
language, as much about associations - it seems to me that, once the latent space gets high-dimensional enough, a lot of problems reduce to adjacency search.
I'm starting to get a (sure, uneducated) feeling that this high-dimensional association encoding and search is fundamental to thinking, in a similar way to how a conditional and a loop is fundamental to (Turing-complete) computing.
Now, the next obvious step is of course to add conditionals and loops (and lots of external memory) to a proto-thinking LLM model, because what could possibly go wrong. In fact, those plugins are one of many attempts to do just that.