I think people who say this are imagining LLMs work something like a statistical model. Maybe it's doing a linear regression or works like a Markov chain. It's not.
A single artificial neuron sort of works like that. But that's sort of like saying a single transistor is just an electronically controlled switch, so the only thing computers can do is switching. It's true in some sense that computers are just doing a lot of switching, but it turns out all this switching is Turing-complete. That means computers can theoretically compute anything that's possible to compute given enough time and memory, which includes anything a human could figure out.
Similar principle applies to LLMs. Using probabilities is part of what they do, but that doesn't preclude them from using logic and rules of inference.
For example we often see people thinking that because an LLM can explain how to do something that therefore it knows how to do it, like arithmetic. That's because if a human can explain how to do something, we know that they can. Yet for an LLM outputting a token sequence for an explanation of something, and outputting a token sequence for solving a problem statement for that problem domain are fundamentally different tasks.
We can get round this with very clever prompt engineering to 'force' chain of reasoning behaviour, as this discussion shows, but the reason we have to do that is precisely because the cognitive architecture of these LLMs is fundamentally different from humans.
Yet these systems are clearly highly capable, and it is possible to dramatically improve their abilities with clever engineering. I think what this means is that LLMs may be incredibly powerful components or elements of systems that may become far more advanced and sophisticated AIs. However to do that engineering and build dramatically more capable systems, we need to have a clear understanding of how and why LLMs work, what their advantages and limitations are, and how to reason about and work with those features.
Please explain to me the process currently happening in your visual cortex as you read this text.
The fact that neuroscience exists as a field (with so many remaining questions) shows that humans also do not understand how we can do all the things we do.
And although I think one could demonstrate fairly easily that ChatGPT is capable of some level of deductive reasoning, my last post wasn't even arguing about any actual capabilities of current LLMs. I was just saying you can't conclude that LLMs can't reason (even in theory) because they're "just generating text based on probabilities".
That said, it's not clear to me what the limits are on LLMs as they scale up. GPT-4 can usually add very large numbers together (I've tested it with 20 digit numbers) without any chain-of-thought, something older models struggled with. I think addition works well because you almost don't need internal working memory to do it. You can _usually_ compute a digit of the answer just by looking at 2-3 digits of each of the summands. Occasionally this isn't true: if you have a long sequence of columns that each sum to 9, then a carry from many digits away can affect the current digit. But that's rare.
Multiplication of large number, by contrast, does require working memory and an iterative algorithm. It makes a lot of sense that chain-of-thought helps with this. The text the LLM writes functions as working memory, and it iteratively generates the response, token by token.
Still, just scaling up the models has also helped a lot with multiplication (even without using chain-of-thought). Presumably larger models can have a larger part of the network devoted to arithmetic. It still doesn't compare to a calculator, and integrating LLMs with other tools or AI models sounds promising. But so far, the results of just scaling LLMs and training data has been surprisingly impressive.
I think example shows LLMs to be more like people not less. It's not at all unusual to see humans struggle to do something until you remind them that they know an algorithm for doing so, and nudge them to apply it step by step. Sometimes you even have to prod them through each step.
LLMs definitely have missing pieces, such as e.g. a working memory, an ability to continue to learn, and an inner monologue, but I don't think their sometimes poor ability to recall and follow a set of rules is what sets them apart.
It's not really that, it's that recalling a set of rules and following a set of rules are fundamentally different tasks for an LLM. This is why we need, and have implemented different training and reinforcement strategies to close that gap. The chain of reasoning ability has had to be specifically trained into the LLMs, it didn't arise spontaneously. However clearly this limitation can be, and is being worked around. The issue is that it's a real and very significant problem that we can't ignore, and which must be worked around in order to make these systems more capable.
The fact is LLMs as they are today have a radically different form of knowledge compared to us and their reasoning ability is very different. This can lead people to look at an LLMs performance on one task and infer things about it's other abilities we think of as being closely related which simply don't apply.
I see a lot of naive statements to the effect that these systems already reason like humans do and know things in the same way that humans do, when investigation into the actual characteristics of these systems shows that we can characterise very important ways in which they are completely unlike us. Yet they do know things and can reason. That's really important because if we're going to close that gap, we need to really understand that gap very well.
Whoa, that's quite a leap there. Not sure where we (as society) are with our understanding of intuition, but I doubt a million monkeys would recognize that the falling of an apple is caused by the same agent as the orbit of planets.
(A Markov algorithm is distinct from a Markov chain, but as far as I can tell you could emulate a Markov algorithm with a Markov chain with sufficient number of input states, transitions clamped to 0% or 100%, and allowing it to iterate over its own output; with a large enough state machine, iteration, and a mechanism to provide memory it's almost hard not to end up with a Turing machine)