undefined | Better HN

0 pointsalbertgoeswoof3y ago0 comments

Sorry but this is the reason it’s unable to solve the parents puzzle. It’s doing a lot but it’s not logically reasoning about the puzzle, and in this case it’s not exhibiting logical behaviour in the result so it’s really obvious to see.

Eg when solving this puzzle you might visualise the lion/goat/cabbage, and walk through the scenarios in your head back and forth multiple times until you find a solution that works. A LLM won’t solve it like this. You could ask it to, and it will list out the scenarios of how it might do it, but it’s essentially an illusion of logical reasoning.

0 comments

int_19h3y ago

If you gave this puzzle to a human, I bet that a non-insignificant proportion would respond to it as if it were the traditional puzzle as soon as they hear words "cabbage", "lion", and "goat". It's not exactly surprising that a model trained on human outputs would make the same assumption. But that doesn't mean that it can't reason about it properly if you point out that the assumption was incorrect.

With Bing, you don't even need to tell you what it assumed wrong - I just told it that it's not quite the same as the classic puzzle, and it responded by correctly identifying the difference and asking me if that's what I meant, but forgot that lion still eats the goat. When I pointed that out, it solved the puzzle correctly.

Generally speaking, I think your point that "when solving the puzzle you might visualize" is correct, but that is orthogonal to the ability of LLM to reason in general. Rather, it has a hard time to reason about things it doesn't understand well enough (i.e. the ones for which its internal model that was built up by training is in is way off). This seems to be generally the case for anything having to do with spatial orientation - even fairly simple multi-step tasks involving concepts like "left" vs "right" or "on this side" vs "on that side" can get hilariously wrong.

But if you give it a different task, you can see reasoning in action. For example, have it play guess-the-animal game with you while telling it to "think out loud".

oska3y ago

> But if you give it a different task, you can see reasoning in action. For example, have it play guess-the-animal game with you while telling it to "think out loud".

I'm not sure if you put "think out loud" in quotes to show literally what you told it to do or because telling the LLM to do that is figurative speech (because it can't actually think). Your talk about 'reasoning in action' indicates it was probably not the latter, but that is how I would use quotes in this context. The LLM can not 'think out loud' because it cannot actually think. It can only generate text that mimics the process of humans 'thinking out loud'.

int_19h3y ago

It's in quotes because you can literally use that exact phrase and get results.

As far as "it mimics" angle... let me put it this way: I believe that the whole Chinese room argument is unscientific nonsense. I can literally see GPT take inputs, make conclusions based on them, and ask me questions to test its hypotheses, right before my eyes in real time. And it does lead it to produce better results than it otherwise would. I don't know what constitutes "the real thing" in your book, but this qualifies in mine.

And yeah, it's not that good at logical reasoning, mind you. But its model of the world is built solely from text (much of which doesn't even describe the real world!), and then it all has to fit into a measly 175B parameters. And on top of that, its entire short-term memory consists of its 4K token window. What's amazing is that it is still, somehow, better than some people. What's important is that it's good enough for many tasks that do require the capacity to reason.

1 more reply

PoignardAzur3y ago

Yup. I tried to give ChatGPT an obfuscated variant of the lion-goat-cabbage problem (shapes instead of animals, boxes instead of a boat) and it completely choked on it.

I do wonder if GPT-4 would do better, though.

usaar3333y ago

GPT4 seems far better at this class of ordering and puzzle problems.

FWIW, it passes basic substitution.

mr_toad3y ago

> in this case it’s not exhibiting logical behaviour

True.

> A LLM won’t solve it like this.

Non sequitur.

throwwwaway693y ago

Trying to claim you definitively know why it didn't solve the parent's puzzle is virtually impossible. There are way too many factors and nothing here is obvious. Your claims just reinforce that you don't really know what you're talking about.

j / k navigate · click thread line to collapse

0 comments

int_19h3y ago

But if you give it a different task, you can see reasoning in action. For example, have it play guess-the-animal game with you while telling it to "think out loud".

oska3y ago

> But if you give it a different task, you can see reasoning in action. For example, have it play guess-the-animal game with you while telling it to "think out loud".

int_19h3y ago

It's in quotes because you can literally use that exact phrase and get results.

1 more reply

PoignardAzur3y ago

Yup. I tried to give ChatGPT an obfuscated variant of the lion-goat-cabbage problem (shapes instead of animals, boxes instead of a boat) and it completely choked on it.

I do wonder if GPT-4 would do better, though.

usaar3333y ago

GPT4 seems far better at this class of ordering and puzzle problems.

FWIW, it passes basic substitution.

mr_toad3y ago

> in this case it’s not exhibiting logical behaviour

True.

> A LLM won’t solve it like this.

Non sequitur.

throwwwaway693y ago

j / k navigate · click thread line to collapse