undefined | Better HN

0 pointsachow1y ago0 comments

What I'm not able to comprehend is why people are not seeing the answer as brilliant!

Any ordinary mortal (like me) would have jumped to the conclusion that answer is "Father" and would have walked away patting on my back, without realising that I was biased by statistics.

Whereas o1, at the very outset smelled out that it is a riddle - why would anyone out of blue ask such question. So, it started its chain of thought with "Interpreting the riddle" (smart!).

In my book that is the difference between me and people who are very smart and are generally able to navigate the world better (cracking interviews or navigating internal politics in a corporate).

0 comments

grey-area1y ago

The 'riddle': A woman and her son are in a car accident. The woman is sadly killed. The boy is rushed to hospital. When the doctor sees the boy he says "I can't operate on this child, he is my son". How is this possible?

GPT Answer: The doctor is the boy's mother

Real Answer: Boy = Son, Woman = Mother (and her son), Doctor = Father (he says...he is my son)

This is not in fact a riddle (though presented as one) and the answer given is not in any sense brilliant. This is a failure of the model on a very basic question, not a win.

It's non deterministic so might sometimes answer correctly and sometimes incorrectly. It will also accept corrections on any point, even when it is right, unlike a thinking being when they are sure on facts.

LLMs are very interesting and a huge milestone, but generative AI is the best label for them - they generate statistically likely text, which is convincing but often inaccurate and it has no real sense of correct or incorrect, needs more work and it's unclear if this approach will ever get to general AI. Interesting work though and I hope they keep trying.

kasdfasH1y ago

The original riddle is of course:

"A father and his son are in a car accident [...] When the boy is in hospital, the surgeon says: This is my child, I cannot operate on him".

In the original riddle the answer is that the surgeon is female and the boy's mother. The riddle was supposed to point out gender stereotypes.

So, as usual, ChatGPT fails to answer the modified riddle and gives the plagiarized stock answer and explanation to the original one. No intelligence here.

TeMPOraL1y ago

> So, as usual, ChatGPT fails to answer the modified riddle and gives the plagiarized stock answer and explanation to the original one. No intelligence here.

Or, fails in the same way any human would, when giving a snap answer to a riddle told to them on the fly - typically, a person would recognize a familiar riddle half of the first sentence in, and stop listening carefully, not expecting the other party to give them a modified version.

It's something we drill into kids in school, and often into adults too: read carefully. Because we're all prone to pattern-matching the general shape to something we've seen before and zoning out.

grey-area1y ago

I'm curious what you think is happening here as your answer seems to imply it is thinking (and indeed rushing to an answer somehow). Do you think the generative AI has agency or a thought process? It doesn't seem to have anything approaching that to me, nor does it answer quickly.

It seems to be more like a weighing machine based on past tokens encountered together, so this is exactly the kind of answer we'd expect on a trivial question (I had no confusion over this question, my only confusion was why it was so basic).

It is surprisingly good at deceiving people and looking like it is thinking, when it only performs one of the many processes we use to think - pattern matching.

1 more reply

pedrosorio1y ago

> Or, fails in the same way any human would, when giving a snap answer to a riddle told to them on the fly

The point of o1 is that it's good at reasoning because it's not purely operating in the "giving a snap answer on the fly" mode, unlike the previous models released by OpenAI.

accountnum1y ago

It literally is a riddle, just as the original one was, because it tries to use your expectations of the world against you. The entire point of the original, which a lot of people fell for, was to expose expectations of gender roles leading to a supposed contradiction that didn't exist.

You are now asking a modified question to a model that has seen the unmodified one millions of times. The model has an expectation of the answer, and the modified riddle uses that expectation to trick the model into seeing the question as something it isn't.

That's it. You can transform the problem into a slightly different variant and the model will trivially solve it.

jfengel1y ago

Phrased as it is, it deliberately gives away the answer by using the pronoun "he" for the doctor. The original deliberately obfuscates it by avoiding pronouns.

So it doesn't take an understanding of gender roles, just grammar.

accountnum1y ago

My point isn't that the model falls for gender stereotypes, but that it falls for thinking that it needs to solve the unmodified riddle.

Humans fail at the original because they expect doctors to be male and miss crucial information because of that assumption. The model fails at the modification because it assumes that it is the unmodified riddle and misses crucial information because of that assumption.

In both cases, the trick is to subvert assumptions. To provoke the human or LLM into taking a reasoning shortcut that leads them astray.

You can construct arbitrary situations like this one, and the LLM will get it unless you deliberately try to confuse it by basing it on a well known variation with a different answer.

I mean, genuinely, do you believe that LLMs don't understand grammar? Have you ever interacted with one? Why not test that theory outside of adversarial examples that humans fall for as well?

1 more reply

roomey1y ago

Why couldn't the doctor be the boys mother?

There is no indication of the sex of the doctor, and families that consist of two mothers do actually exist and probably doesn't even count as that unusual.

4 more replies

lanstin1y ago

"There are four lights"- GPT will not pass that test as is. I have done a bunch of homework with Claude's help and so far this preview model has much nicer formatting but much the same limits of understanding the maths.

pkage1y ago

I mean, it's entirely possible the boy has two mothers. This seems like a perfectly reasonable answer from the model, no?

2 more replies

yywwbbn1y ago

> why would anyone out of blue ask such question

I would certainly expect any person to have the same reaction.

> So, it started its chain of thought with "Interpreting the riddle" (smart!).

How is that smarter than intuitively arriving at the correct answer without having to explicitly list the intermediate step? Being able to reasonably accurately judge the complexity of a problem with minimal effort seems “smarter” to me.

ImHereToVote1y ago

The doctor is obviously a parent of the boy. The language tricks simply emulate the ambiance of reasoning. Similarly to a political system emulating the ambiance of democracy.

geysersam1y ago

Come on. Of course chatgpt has read that riddle and the answer 1000 times already.

accountnum1y ago

It hasn't read that riddle because it is a modified version. The model would in fact solve this trivially if it _didn't_ see the original in its training. That's the entire trick.

geysersam1y ago

Sure but the parent was praising the model for recognizing that it was a riddle in the first place:

> Whereas o1, at the very outset smelled out that it is a riddle

That doesn't seem very impressive since it's (an adaptation of) a famous riddle

The fact that it also gets it wrong after reasoning about it for a long time doesn't make it better of course

accountnum1y ago

Recognizing that it is a riddle isn't impressive, true. But the duration of its reasoning is irrelevant, since the riddle works on misdirection. As I keep saying here, give someone uninitiated the 7 wives with 7 bags going (or not) to St Ives riddle and you'll see them reasoning for quite some time before they give you a wrong answer.

If you are tricked about the nature of the problem at the outset, then all reasoning does is drive you further in the wrong direction, making you solve the wrong problem.

ryanjshaw1y ago

Why does it exist 1000 times in the training if there isn't some trick to it, i.e. some subset of humans had to have answered it incorrectly for the meme to replicate that extensively in our collective knowledge.

And remember the LLM has already read a billion other things, and now needs to figure out - is this one of them tricky situations, or the straightforward ones? It also has to realize all the humans on forums and facebook answering the problem incorrectly are bad data.

Might seem simple to you, but it's not.

j / k navigate · click thread line to collapse

0 comments

grey-area1y ago

GPT Answer: The doctor is the boy's mother

Real Answer: Boy = Son, Woman = Mother (and her son), Doctor = Father (he says...he is my son)

This is not in fact a riddle (though presented as one) and the answer given is not in any sense brilliant. This is a failure of the model on a very basic question, not a win.

kasdfasH1y ago

The original riddle is of course:

"A father and his son are in a car accident [...] When the boy is in hospital, the surgeon says: This is my child, I cannot operate on him".

In the original riddle the answer is that the surgeon is female and the boy's mother. The riddle was supposed to point out gender stereotypes.

So, as usual, ChatGPT fails to answer the modified riddle and gives the plagiarized stock answer and explanation to the original one. No intelligence here.

TeMPOraL1y ago

> So, as usual, ChatGPT fails to answer the modified riddle and gives the plagiarized stock answer and explanation to the original one. No intelligence here.

It's something we drill into kids in school, and often into adults too: read carefully. Because we're all prone to pattern-matching the general shape to something we've seen before and zoning out.

grey-area1y ago

It is surprisingly good at deceiving people and looking like it is thinking, when it only performs one of the many processes we use to think - pattern matching.

1 more reply

pedrosorio1y ago

> Or, fails in the same way any human would, when giving a snap answer to a riddle told to them on the fly

The point of o1 is that it's good at reasoning because it's not purely operating in the "giving a snap answer on the fly" mode, unlike the previous models released by OpenAI.

accountnum1y ago

That's it. You can transform the problem into a slightly different variant and the model will trivially solve it.

jfengel1y ago

Phrased as it is, it deliberately gives away the answer by using the pronoun "he" for the doctor. The original deliberately obfuscates it by avoiding pronouns.

So it doesn't take an understanding of gender roles, just grammar.

accountnum1y ago

My point isn't that the model falls for gender stereotypes, but that it falls for thinking that it needs to solve the unmodified riddle.

In both cases, the trick is to subvert assumptions. To provoke the human or LLM into taking a reasoning shortcut that leads them astray.

You can construct arbitrary situations like this one, and the LLM will get it unless you deliberately try to confuse it by basing it on a well known variation with a different answer.

I mean, genuinely, do you believe that LLMs don't understand grammar? Have you ever interacted with one? Why not test that theory outside of adversarial examples that humans fall for as well?

1 more reply

roomey1y ago

Why couldn't the doctor be the boys mother?

There is no indication of the sex of the doctor, and families that consist of two mothers do actually exist and probably doesn't even count as that unusual.

4 more replies

lanstin1y ago

pkage1y ago

I mean, it's entirely possible the boy has two mothers. This seems like a perfectly reasonable answer from the model, no?

2 more replies

yywwbbn1y ago

> why would anyone out of blue ask such question

I would certainly expect any person to have the same reaction.

> So, it started its chain of thought with "Interpreting the riddle" (smart!).

ImHereToVote1y ago

The doctor is obviously a parent of the boy. The language tricks simply emulate the ambiance of reasoning. Similarly to a political system emulating the ambiance of democracy.

geysersam1y ago

Come on. Of course chatgpt has read that riddle and the answer 1000 times already.

accountnum1y ago

It hasn't read that riddle because it is a modified version. The model would in fact solve this trivially if it _didn't_ see the original in its training. That's the entire trick.

geysersam1y ago

Sure but the parent was praising the model for recognizing that it was a riddle in the first place:

> Whereas o1, at the very outset smelled out that it is a riddle

That doesn't seem very impressive since it's (an adaptation of) a famous riddle

The fact that it also gets it wrong after reasoning about it for a long time doesn't make it better of course

accountnum1y ago

If you are tricked about the nature of the problem at the outset, then all reasoning does is drive you further in the wrong direction, making you solve the wrong problem.

ryanjshaw1y ago

Might seem simple to you, but it's not.

j / k navigate · click thread line to collapse