>
We refer to algorithms like quicksort as 'reasoning' about the input. So it's fine to use the same sense of the word to apply to stochastic parrots.That's an interesting take, because I wouldn't call quicksort itself to be "reasoning". It's a step-by-step algorithm. Once a human learns it, accepts it as correct, and then runs it in their thought-space in order to transform some thought-space structure by sorting, only then I'd call it an exercise of reasoning. Note here that for humans, running quicksort is generally a slow, bug-prone, step-by-step Turing machine emulation in the conscious layer. Maaaaaybe after doing this enough, your subconscious layer will get a feel for it and start executing it for you faster.
The reason I'm saying it is that:
> I suppose something I'm interested in is whether an LLM that can't sort numbers could be instructed how as a prompt and then do so.
I think if you could describe a quicksort-equivalent algorithm to an LLM, one that does things LLM can't tackle directly, and it proceeded to execute that algorithm - I'd give it the same badge of "exercise reasoning" as I'd give to a human.
I think GPT-4 is very much capable of this for simple enough algorithms, but the way it looks like is, you need to get it to spell out individual steps (yes, this is the "chain of thought" "trick"). In my eyes, GPT-4 is playing part of our inner voice - the language-using process bridging subconscious and conscious levels. So if you want it to do equivalent of conscious reasoning, you need to let it "talk it out loud" and have it "hear" itself, the same way a human stepping an algorithm in their head will verbalize, or otherwise keep conscious awareness off, the algorithm description, and the last few steps they've executed.
With this set up, LLMs will still make mistakes. But so do humans! We call this "losing focus", "brain farts", "forgetting to carry one" or "forgetting to carry over the minus sign", etc. Humans can also cheat, off-loading parts of the process to their subconscious, if it fits some pattern they've learned. And so can LLMs - apparently, GPT-4 has a quite good feel for Python, so it can do larger "single steps" if those steps are expressed in code.
The main difference in the above comparison is, indeed, plasticity. Do the exercise enough times, and humans will get better at it, by learning new patterns that subconscious level can execute in one step. LLMs currently can't do that - but that's more of an interface limitation. OpenAI could let GPT-4 self-drive its fine-tuneing based on frequently seen problems, but at this point in time, it would likely cost a lot and wouldn't be particularly effective. But we can only interact with a static, stateless version of the model. But hey, maybe one of the weaker, cheaper, fine-tuneable model is already good enough someone could test this "plasticity by self-guided fine-tuning" approach.
FWIW, I agree with GP/author on:
> the embeddings generated by an LLM go beyond the normal word2vec picture that most people have of embeddings, and I believe are closer to whatever "understanding" means if it could be formally defined.
In fact, my pet hypothesis is that the absurd number of dimensions LLM latent spaces allow to encode any kind of semantic similarity we could think of between tokens, or groups of tokens, as spatial proximity along some subset of dimensions - and secondly, that this is exactly how "understanding" and "abstract reasoning" works for humans.