undefined | Better HN

0 pointsnayroclade3y ago0 comments

It's well documented that these models can solve variations of questions that are not found anywhere in their training set, and even entirely novel problems invented by prompters. Not with 100% success, but they can do it far with a rate far better than chance, so the idea that they're pulling responses from their training data is simply not correct.

0 comments

12 comments · 2 top-level

bootsmann3y ago· 6 in thread

Ok maybe I should rephrase but think of it like this: Would your IQ test score be accurate if you had a week beforehand to train solving IQ tests?

You don't get exactly the same test in the end, similar with SAT, but the constraints we put on these tests (they have to be comparable) produce patterns in the questions you can train for. This is the same logic why people can train to improve their SAT scores, if they were a measure of true innate intelligence their training would have no impact on their score.

amoss3y ago

If we flip the question around it seems more reasonable: would your IQ test score be accurate if you did not train beforehand on practice IQ tests?

I would presume not - most tests are timed, and if you are spending time on first-time only tasks in understanding the problem then the result is inaccurate. If you train out those first-time tasks so that you are repeatably using the time budget in the test to solve problems then you should reach some kind of steady state and produce repeatable and more accurate test scores.

My take is that the repeatable scores measuring your steady state in the task would be more accurate than the untrained scores with an unknown amount of initialization time within each problem. I would make a similar claim to naasking below that this could account for some of the Flynn effect.

codethief3y ago

But hasn't that always been a problem inherent to IQ tests (and SAT tests, of course)?

beepbooptheory3y ago

It is definitely one of the problems you add to a pile marked "IQ tests are meaningless and cater to the worst conceptions of human intelligence."

siva73y ago

I mean, let's be fair. That's also how many humans learn and do IQ tests.

prng20213y ago

Yes but the critical difference is that AI couldn’t pass these tests if they weren’t trained on a very similar set of questions and answers.

Not every person takes a SAT prep class to improve their test score. There are lots of people who are truly above average in terms of intelligence and can score very high on the first try.

1 more reply

adamsmith1433y ago

> This is the same logic why people can train to improve their SAT scores

So isn't this literally moving the goalpost? "So what an AI can beat the SAT, so can humans"

331c8c713y ago· 4 in thread

Well, my experience was that ChatGPT produced utter nonsense including literally "1/19+1/19=2/19=1" when asked a simple probability question.

nayrocladeOP3y ago

Transcript of a conversation I just had with ChatGPT:

Prompt:

I'm talking with a user named "331c8c71" on Hacker News. According to statistics I found by Googling, the number of Hacker News readers living in the USA is 36%. Assuming this is correct, what is the probability that 331c8c71 does not reside in the USA? Please explain your reasoning.

Response:

If we assume that the statistic you found is accurate, then we know that 36% of Hacker News readers live in the USA. Therefore, the probability that any given Hacker News reader does not live in the USA is 100% - 36% = 64%.

However, we do not know if 331c8c71 is a typical Hacker News reader. They could be part of a demographic that is more or less likely to live in the USA. Without more information, we cannot say for certain what the probability is that 331c8c71 does not reside in the USA.

So, the answer is 64% assuming that 331c8c71 is a typical Hacker News reader, but it may not be an accurate estimate depending on additional information about the user.

---

Like I said, these models are far from 100% accurate, and there is lots they get wrong, but they clearly are capable of some kind of reasoning that goes beyond simple text substitution of training data.

kenjackson3y ago

The fact that just a few years ago even parsing the question would be an achievement. I feel like people are just glossing over how incredible this answer is.

2 more replies

lupire3y ago

"reasoning" is a strong word. "Pattern extraction and application" is a better description of what is happening.

In particular, LLMs fail miserably at tasks like "apply this simple pattern many times in succession" aka "for-loop", because they can't count in an abstract way, only on concrete contexts.

shaunsingh02073y ago

IQ test, the sat, and exact mathematics are very different fields though. ChatGPT etc. are very good at "emulating" by sheer force and size logic, exact math doesn't work with "emulation"

j / k navigate · click thread line to collapse