undefined | Better HN

0 pointssimonh3y ago0 comments

>I don't think their sometimes poor ability to recall and follow a set of rules is what sets them apart

It's not really that, it's that recalling a set of rules and following a set of rules are fundamentally different tasks for an LLM. This is why we need, and have implemented different training and reinforcement strategies to close that gap. The chain of reasoning ability has had to be specifically trained into the LLMs, it didn't arise spontaneously. However clearly this limitation can be, and is being worked around. The issue is that it's a real and very significant problem that we can't ignore, and which must be worked around in order to make these systems more capable.

The fact is LLMs as they are today have a radically different form of knowledge compared to us and their reasoning ability is very different. This can lead people to look at an LLMs performance on one task and infer things about it's other abilities we think of as being closely related which simply don't apply.

I see a lot of naive statements to the effect that these systems already reason like humans do and know things in the same way that humans do, when investigation into the actual characteristics of these systems shows that we can characterise very important ways in which they are completely unlike us. Yet they do know things and can reason. That's really important because if we're going to close that gap, we need to really understand that gap very well.

0 comments

3 comments · 1 top-level

vidarh3y ago· 2 in thread

> It's not really that, it's that recalling a set of rules and following a set of rules are fundamentally different tasks for an LLM.

My point is that this appears to be the case for people too. It is often necessary to explicitly remind people to recall a set of rules to get them to follow the specific rules rather than act in a way that may or may not match the rules.

Having observed this many times, I simply don't believe that most humans will see e.g. an addition and go "oh, right, these are the set of rules I should follow for addition, let me apply them step by step". If we've had the rules reinforced through repetitive training many enough times, we will end up doing them. But a lot of the time people will know the steps but still not necessarily apply them unless prompted, just like LLMs. Quite often people will still give an answer. Sometimes even the correct one.

But without applying the methods we've been taught. To the point where when dealing e.g. with new learners - children in particular - who haven't had enough reinforcement in just applying a method, it's not at all unusual to find yourself having conversations like this: "Ok, so to do X, what are the steps you've been taught? Ok, so you remember that they are A, B and C. Great. Do A. You've done A? Now do B..." and so on.

To me, getting a child to apply a method they know to solve a problem is remarkably close to getting an LLM to actually recall and follow these methods.

But even for professionals, checklists exist for a reason: We often forget steps, or do them wrong, and forget to even try to explicitly recall a list of steps and do them one by one when we don't have a list of steps in front of us.

simonhOP3y ago

I don't believe this works the way you think. Within the same chat session with GPT3 you can ask it to explain addition, then ask it to do addition, and the explanation will be perfectly accurate but the sums it does will be complete rubbish. It's not enough to remind it.

The article og_kalu posted above goes into detail as to what they had to do to teach an LLM how to reason algorithmically in a specific problem domain and it was incredibly hard; much, much more convoluted and involved than just reminding it of the rules. Only an LLM that has gone through this intensive multi-step highly domain specific training regime has a hope of getting good results and then only in that specific problem domain. with a human you teach a reasoning ability and get them to apply it in different domains, with LLMs that doesn't work.

Take this comment in the article "However, despite significant progress, these models still struggle with out-of distribution (OOD) generalization on reasoning tasks". Where humans naturally generalise reasoning techniques from one problem area to another, LLMs flat out don't. If you teach it some reasoning techniques when teaching doing sums, you have to start again from scratch when teaching it how to apply even the same reasoning techniques to any other problem domain, every single time. You can't remind them they learned this or that when learning to do sums and to use it again in this context, as you would with a human, at the moment that flat out doesn't work.

The reason it doesn't work is precisely due to the limitations imposed by token stream prediction. The different tasks involving reasoning are different token stream domains, and techniques the LLM uses to optimise for one token stream domain currently only seem to apply to that token stream domain. If you don't take that into account you will make fundamental errors in reasoning about the capabilities of the system.

So what we need to do is come up with architectures and training techniques to somehow enable them to generalise these reasoning capabilities.

vidarh3y ago

> I don't believe this works the way you think. Within the same chat session with GPT3 you can ask it to explain addition, then ask it to do addition, and the explanation will be perfectly accurate but the sums it does will be complete rubbish. It's not enough to remind it.

Again, I've had this exact experience with people many times as well, so again I don't think this in itself is any kind of indication of whether or not LLMs are all that different from humans in this regard. The point is not that there aren't things missing from LLMs, but that I don't find the claim that this behaviour shows how different they are to be at all convincing.

My experience is that people do not appear naturally generalise reasoning techniques very well unless - possibly - if they are trained at doing that (possibly, because I'm not convinced that even most of those of us with significantly above average intelligence generalise reasoning nearly as well as we'd like to think).

Most people seem to learn not by being taught a new technique and then "automatically applying it", but being taught a new technique and then being made to repetitively practice that technique by being prompted step by step until they've learnt to apply it separate from the process of following the steps, and tend to perform really poorly and make lots of mistakes when doing it by instruction.

> You can't remind them they learned this or that when learning to do sums and to use it again in this context, as you would with a human, at the moment that flat out doesn't work.

I don't know what you're trying to say here. Mentioning a technique to ChatGPT and telling it to go through it step by step is not flawless but it often does work. E.g. I just tested by asking GPT4 for a multiplication method and then asked it to use it on two numbers I provided and show its working, and it did just fine. At the same time, doing this with humans often requires a disturbingly high level of step by step prompting (having a child, I've been through a torturous amount of this). I won't suggest ChatGPT is as good as following instructions as people, yet, but most people are also really awfully horrible at following instructions.

1 more reply

j / k navigate · click thread line to collapse