undefined | Better HN

0 pointsnsagent2y ago0 comments

Maybe don't assume that PhD-level NLP researchers are out of touch on cognitive neuroscience topics related to language understanding. The latest research seems to indicate that language production and understanding exist separately from other forms of cognitive capacity. This includes people with global aphasia (no language ability) being able to do math, understand social situations, appreciate music, etc.

If you want to follow this more closely, I'd recommend the work of Evelina Fedorneko a cognitive neuroscientist at MIT who specializes in language understanding.

Check out these talks for more details: https://youtu.be/TsoQFZxrv-I?t=580 https://youtu.be/qublpBRtN_w

What this means in the context of LLMs is that next word prediction alone does not provide the breadth of cognitive capacity humans exhibit. Again, I'd posit GPT-2 is plenty capable as an LM, if combined with an approach to perform higher-level reasoning to guide language generation. Unfortunately, what that system is and how to design it currently eludes us.

0 comments

honzabe2y ago

First, you are right I should not assume anyone's knowledge (or lack thereof). It just popped into my mind as something that could explain the thing that's been puzzling me for months - what are people talking about when they say that LLMs are not actually reasoning, or Stable Diffusion is not actually creating? I wish I had not included that assumption and was inquisitive instead. Let me try again.

Maybe I diverted your focus the wrong way when I used LLMs as an example - what if I used more general term "neural network"? I said LLMs because this thread is about LLMs but let me clarify what I meant:

The thing that interests me in this thread is the claim that LLMs are "not capable of actually reasoning". Whether you agree with it depends on your mental model of actual reasoning, right?

My model of reasoning: the fundamental thing about it is that I have a network of things. The signal travels through the network guided by the weight of connections between them and fires some pattern of the things. That pattern represents something. Maybe it is a word in the case of LLMs (or syllable or whatever the token actually is - let's ignore those details for now) or a thought in the case of my brain (I was not saying people reason in language) - the resulting "token" can be many things, I imagine (like some mental representation of objects and their positions in spatial reasoning) - those are the specifics, but "essentially", the underlying mechanism is the same.

In my mental model, there is nothing fundamental that distinguishes what LLMs do from the "actual reasoning". If you have enough compute and good enough training data, you can create LLM reasoning as well as humans - that is my default hypothesis.

If I understand your position, you would not agree with that, correct? I am not claiming you are wrong - I know way too little for that. I would just be really curious - what is your mental model of actual reasoning? What does it have that LLMs do not have?

I know you mentioned that "these models have no capacity to plan ahead" - I am not sure I understand what you mean by that. Is this not just a matter of training?

BTW, I have talked about this topic before and some people apparently see conscience as a necessary part of actual reasoning. I do not - do you?