undefined | Better HN

0 pointsnsagent2y ago0 comments

As an PhD student in NLP who's graduating soon, my perspective is that language models do not demonstrate "reasoning" in the way most people colloquially use the term.

These models have no capacity to plan ahead, which is a requirement for many "reasoning" problems. If it's not in the context, the model is unlikely to use it for predicting the next token. That's why techniques like chain-of-thought are popular; they cause the model to parrot a list of facts before making a decision. This increases the likelihood that the context might contain parts of the answer.

Unfortunately, this means the "reasoning" exhibited by language models is limited: if the training data does not contain a set of generalizable text applicable to a particular domain, a language model is unlikely to make a correct inference when confronted with a novel version of a similar situation.

That said, I do think adding reasoning capabilities is an active area of research, but we don't have a clear time horizon on when that might happen. Current prompting approaches are stopgaps until research identifies a promising approach for developing reasoning, e.g. combining latent space representations with planning algorithms over knowledge bases, constraining the logits based on an external knowledge verifier, etc (these are just random ideas, not saying they are what people are working on, rather are examples of possible approaches to the problem).

In my opinion, language models have been good enough since the GPT-2 era, but have been held back by a lack of reasoning and efficient memory. Making the language models larger and trained on more data helps make them more useful by incorporating more facts with increased computational capacity, but the approach is fundamentally a dead end for higher level reasoning capability.

0 comments

kromem2y ago

Congrats on the upcoming PhD!

I'm curious where you are drawing your definition or scope for 'reasoning' from?

For example, in Shuren The Neurology of Reasoning (2002) the definition selected was "the ability to draw conclusions from given information."

While I agree that LLMs can only process token to token and that juggling context is critical to effective operation such that CoT or ToT approaches are necessary to maximize the ability to synthesize conclusions, I'm not quite sure what the definition of reasoning you have in mind is such that these capabilities fall outside of it.

The typical lay audience suggestion that LLMs cannot generate new information or perspectives outside of the training data isn't the case, as I'm sure you're aware, and synthesizing new or original conclusions from input is very much within their capabilities.

Yes, this has to happen within a context window and occurs on a token by token basis, but that seems like a somewhat arbitrary distinction. Humans are unquestionably better at memory access and running multiple subprocesses on information than an LLM.

But if anything, this simply suggests that continuing to move in the direction of multiple pass processing of NLP tasks with selective contexts and a variety of fine tuned specializations of intermediate processing is where practical short term gains might lie.

As for the issue of new domains outside of training data, I'm somewhat surprised by your perspective. Hasn't one of the big research trends over the past twelve months been that in context learning has proven more capable than was previously expected? I'd agree that a zero shot evaluation of a problem type that isn't represented in a LLMs training data is setting it up for failure, but the capacity to extend in context examples outside of training data has proven relatively more successful, no?

jimmaswell2y ago

> These models have no capacity to plan ahead, which is a requirement for many "reasoning" problems. If it's not in the context, the model is unlikely to use it for predicting the next token. That's why techniques like chain-of-thought are popular; they cause the model to parrot a list of facts before making a decision. This increases the likelihood that the context might contain parts of the answer.

Is it not possible that this is essentially how our brains do it too? Attempt to plan by branching out to related ideas until they contain an answer. Any of these statements that AI can't be on track to reason like a human because of X seem to come with an implication that we have such a good model of the human brain that we know it doesn't X. But I'm not an expert on neuroscience so in many of these cases maybe that implication is true.

freejazz2y ago

>Is it not possible that this is essentially how our brains do it too?

Is that how you think? Just curious

honzabe2y ago

I think the word "essentially" is important here. I don't think we can observe how we think. How it appears in consciousness is not necessarily real - it might be just a model constructed ex-post.

I do not know that much about AI but I know at least something about cognitive psychology and it seems to me that a lot of claims about LLMs "not actually reasoning" and similar are probably made by CS graduates who have unreflected assumptions about how human thinking works.

I don't claim to know how human thinking works but if there is one thing I would conclude from studying psychology and knowing at least some basics about neuroscience, it would be that "it's not how it appears to us".

Nobody knows how human reasoning actually works but if I had to guess (based on my amateurish mental model of the functioning of the human brain), I would say that it is probably a lot closer to LLMs and a lot less rational than is commonly assumed in discussions like this one.

nsagentOP2y ago

Maybe don't assume that PhD-level NLP researchers are out of touch on cognitive neuroscience topics related to language understanding. The latest research seems to indicate that language production and understanding exist separately from other forms of cognitive capacity. This includes people with global aphasia (no language ability) being able to do math, understand social situations, appreciate music, etc.

If you want to follow this more closely, I'd recommend the work of Evelina Fedorneko a cognitive neuroscientist at MIT who specializes in language understanding.

Check out these talks for more details: https://youtu.be/TsoQFZxrv-I?t=580 https://youtu.be/qublpBRtN_w

What this means in the context of LLMs is that next word prediction alone does not provide the breadth of cognitive capacity humans exhibit. Again, I'd posit GPT-2 is plenty capable as an LM, if combined with an approach to perform higher-level reasoning to guide language generation. Unfortunately, what that system is and how to design it currently eludes us.

1 more reply

jug2y ago

I don’t think we are conscious about how the language center correlates with our memories and then predicts the strings of words coming out.

visarga2y ago

> if the training data does not contain a set of generalizable text applicable to a particular domain, a language model is unlikely to make a correct inference when confronted with a novel version of a similar situation.

True. But look at the Phi-1.5 model - it punches 5x above its weight limit. The trick is in the dataset:

> Our training data for phi-1.5 is a combination of phi-1’s training data (7B tokens) and newly created synthetic, “textbook-like” data (roughly 20B tokens) for the purpose of teaching common sense reasoning and general knowledge of the world (science, daily activities, theory of mind, etc.). We carefully selected 20K topics to seed the generation of this new synthetic data. In our generation prompts, we use samples from web datasets for diversity. We point out that the only non-synthetic part in our training data for phi-1.5 consists of the 6B tokens of filtered code dataset used in phi-1’s training (see [GZA+ 23]).

> We remark that the experience gained in the process of creating the training data for both phi-1 and phi-1.5 leads us to the conclusion that the creation of a robust and comprehensive dataset demands more than raw computational power: It requires intricate iterations, strategic topic selection, and a deep understanding of knowledge gaps to ensure quality and diversity of the data. We speculate that the creation of synthetic datasets will become, in the near future, an important technical skill and a central topic of research in AI.

https://arxiv.org/pdf/2309.05463.pdf

Synthetic data has its advantages - less bias, more diverse, scalable, higher average quality. But more importantly, it can cover all the permutations and combinations of skills, concepts, situations. That's why a small model just 1.5B like Phi was able to work like a 7B model. Usually at that scale they are not coherent.

Der_Einzige2y ago

Are you going to school in Langley, Virginia?

NovemberWhiskey2y ago

NSA is more commonly associated with Fort Meade, MD, for what that's worth.

Kim_Bruning2y ago

> These models have no capacity to plan ahead

How would you describe the behavior of "GPT Advanced Data Analysis"?

j / k navigate · click thread line to collapse