undefined | Better HN

0 pointsHDThoreaun6mo ago0 comments

Arc-AGI is just an iq test. I don’t see the problem with training it to be good at iq tests because that’s a skill that translates well.

0 comments

11 comments · 2 top-level

CamperBob26mo ago· 9 in thread

Exactly. In principle, at least, the only way to overfit to Arc-AGI is to actually be that smart.

Edit: if you disagree, try actually TAKING the Arc-AGI 2 test, then post.

npinsker6mo ago

Completely false. This is like saying being good at chess is equivalent to being smart.

Look no farther than the hodgepodge of independent teams running cheaper models (and no doubt thousands of their own puzzles, many of which surely overlap with the private set) that somehow keep up with SotA, to see how impactful proper practice can be.

The benchmark isn’t particularly strong against gaming, especially with private data.

mrandish6mo ago

ARC-AGI was designed specifically for evaluating deeper reasoning in LLMs, including being resistant to LLMs 'training to the test'. If you read Francois' papers, he's well aware of the challenge and has done valuable work toward this goal.

1 more reply

CamperBob26mo ago

Completely false. This is like saying being good at chess is equivalent to being smart.

No, it isn't. Go take the test yourself and you'll understand how wrong that is. Arc-AGI is intentionally unlike any other benchmark.

1 more reply

ACCount376mo ago

With this kind of thing, the tails ALWAYS come apart, in the end. They come apart later for more robust tests, but "later" isn't "never", far from it.

Having a high IQ helps a lot in chess. But there's a considerable "non-IQ" component in chess too.

Let's assume "all metrics are perfect" for now. Then, when you score people by "chess performance"? You wouldn't see the people with the highest intelligence ever at the top. You'd get people with pretty high intelligence, but extremely, hilariously strong chess-specific skills. The tails came apart.

Same goes for things like ARC-AGI and ARC-AGI-2. It's an interesting metric (isomorphic to the progressive matrix test? usable for measuring human IQ perhaps?), but no metric is perfect - and ARC-AGI is biased heavily towards spatial reasoning specifically.

jimbokun6mo ago

Is it different every time? Otherwise the training could just memorize the answers.

CamperBob26mo ago

The models never have access to the answers for the private set -- again, at least in principle. Whether that's actually true, I have no idea.

The idea behind Arc-AGI is that you can train all you want on the answers, because knowing the solution to one problem isn't helpful on the others.

In fact, the way the test works is that the model is given several examples of worked solutions for each problem class, and is then required to infer the underlying rule(s) needed to solve a different instance of the same type of problem.

That's why comparing Arc-AGI to chess or other benchmaxxing exercises is completely off base.

(IMO, an even better test for AGI would be "Make up some original Arc-AGI problems.")

FergusArgyll6mo ago

It's very much a vision test. The reason all the models don't pass it easily is only because of the vision component. It doesn't have much to do with reasoning at all

esafak6mo ago

I would not be so sure. You can always prep to the test.

HDThoreaunOP6mo ago

How do you prep for arc agi? If the answer is just "get really good at pattern recognition" I do not see that as a negative at all.

1 more reply

fwip6mo ago

It is very similar to an IQ test, with all the attendant problems that entails. Looking at the Arc-AGI problems, it seems like visual/spatial reasoning is just about the only thing they are testing.

j / k navigate · click thread line to collapse

0 comments

11 comments · 2 top-level

CamperBob26mo ago· 9 in thread

Exactly. In principle, at least, the only way to overfit to Arc-AGI is to actually be that smart.

Edit: if you disagree, try actually TAKING the Arc-AGI 2 test, then post.

npinsker6mo ago

Completely false. This is like saying being good at chess is equivalent to being smart.

The benchmark isn’t particularly strong against gaming, especially with private data.

mrandish6mo ago

1 more reply

CamperBob26mo ago

Completely false. This is like saying being good at chess is equivalent to being smart.

No, it isn't. Go take the test yourself and you'll understand how wrong that is. Arc-AGI is intentionally unlike any other benchmark.

1 more reply

ACCount376mo ago

With this kind of thing, the tails ALWAYS come apart, in the end. They come apart later for more robust tests, but "later" isn't "never", far from it.

Having a high IQ helps a lot in chess. But there's a considerable "non-IQ" component in chess too.

jimbokun6mo ago

Is it different every time? Otherwise the training could just memorize the answers.

CamperBob26mo ago

The models never have access to the answers for the private set -- again, at least in principle. Whether that's actually true, I have no idea.

The idea behind Arc-AGI is that you can train all you want on the answers, because knowing the solution to one problem isn't helpful on the others.

That's why comparing Arc-AGI to chess or other benchmaxxing exercises is completely off base.

(IMO, an even better test for AGI would be "Make up some original Arc-AGI problems.")

FergusArgyll6mo ago

It's very much a vision test. The reason all the models don't pass it easily is only because of the vision component. It doesn't have much to do with reasoning at all

esafak6mo ago

I would not be so sure. You can always prep to the test.

HDThoreaunOP6mo ago

How do you prep for arc agi? If the answer is just "get really good at pattern recognition" I do not see that as a negative at all.

1 more reply

fwip6mo ago

It is very similar to an IQ test, with all the attendant problems that entails. Looking at the Arc-AGI problems, it seems like visual/spatial reasoning is just about the only thing they are testing.

j / k navigate · click thread line to collapse