The model outputs what it predicts a statistically normal output would fit in the context given.
Truly “llm” and these gpt tools are very much large scale “soundex” models.
Fantastic and great.
But not ai or even agi.
Is human intelligence anything more than a statistical model? Our entire biology is a massive gradient descent optimization system. Our brains are no different. The establishment of connectivity and potential and resistance, etc etc, it’s statistical in behavior all the way down. Our way of learning is what these models are built around, to the best of our ability. It’s not perfect but it’s a reasonable approximation.
Further it’s not soundex. I see the stochastic parrot argument too much and it’s annoying. Soundex is symbolic only. LLMs are also semantic. In fact the semantic nature is where their interesting properties emerge from. The “just a fancy Markov model” or “just a large scale soundex” misses the entire point of what they do. Yes they involve tokenizing and symbols and even conditional probability. But so does our intelligence. The neural net based attention to semantic structure is however not soundex of Markov model. It’s a genuine innovation and the properties that emerge are new.
But new doesn’t mean complete. To be complete you need to build an ensemble model integrating all the classical techniques of goal based agency, optimization, solvers, inductive/deductive reasoning systems, IR, etc etc in a feedback loop. The LLM provides an ability to reason abductively in an abstract semantic space and interpret inputs and draw conclusions classical AI is very bad at. The places where LLM fall down… well, classical AI really shine there. Why does it need to be able to do logic as well as a logical solver? We already have profoundly powerful logic systems. Why does it need to count? We already have things that count. What we did not have is what LLMs provide, and more specifically multimodal LLMs.
This is why we see improvement in GPT when chain of thought/tree of thought is used with reasoning for each step. That can't correct every failure mode, but it increases the likelihood you'll receive a more correct answer.