undefined | Better HN

0 pointsfchollet1y ago0 comments

The first time a top lab spent millions trying to beat ARC was actually in 2021, and the effort failed.

By the time OpenAI attempted ARC in 2024, a colossal amount of resources had already been expended trying to beat the benchmark. The OpenAI run itself costs several millions in inference compute alone.

ARC was the only benchmark that highlighted o3 as having qualitatively different abilities compared to all models that came before. o3 is a case of a good approach meeting an appropriate benchmark, rather than an effort to beat ARC specifically.

0 comments

YeGoblynQueenne1y ago

>> The first time a top lab spent millions trying to beat ARC was actually in 2021, and the effort failed.

Which top lab was that? What did they try?

>> ARC was the only benchmark that highlighted o3 as having qualitatively different abilities compared to all models that came before.

Unfortunately observations support a simpler hypothesis: o3 was trained on sufficient data about ARC-1 that it could solve it well. There is currently insufficient data on ARC-II to solve it therefore o3 can't solve it. No super magickal and mysterious qualitatively different abilities to all models that came before required whatsoever.

Indeed, that is a common pattern in machine learning research: newer models perform better on benchmarks than earlier models not because their capabilities increase with respect to earlier models but because they're bigger models, trained on more data and more compute. They're just bigger, slower, more expensive- and just as dumb as their predecessors.

That's 90% of deep learning research in a nutshell.

bubblyworld1y ago

I'm sorry, but what observations support that hypothesis? There were scores of teams trying exactly that - training LLMs directly on Arc-AGI data - and by and large they achieved mediocre results. It just isn't an approach that works for this problem set.

To be honest your argument sounds like an attempt to motivate a predetermined conclusion.

YeGoblynQueenne1y ago

In which case what is the point of your comment? I mean what do you expect me to do after reading it, reach a different predetermined conclusion?

bubblyworld1y ago

Provide some evidence for your claims? This empty rhetoric stuff in every AI thread on HN wears me out a bit. I apologise for being a little aggressive in my previous comment.

j / k navigate · click thread line to collapse

0 comments

YeGoblynQueenne1y ago

>> The first time a top lab spent millions trying to beat ARC was actually in 2021, and the effort failed.

Which top lab was that? What did they try?

>> ARC was the only benchmark that highlighted o3 as having qualitatively different abilities compared to all models that came before.

That's 90% of deep learning research in a nutshell.

bubblyworld1y ago

To be honest your argument sounds like an attempt to motivate a predetermined conclusion.

YeGoblynQueenne1y ago

In which case what is the point of your comment? I mean what do you expect me to do after reading it, reach a different predetermined conclusion?

bubblyworld1y ago

Provide some evidence for your claims? This empty rhetoric stuff in every AI thread on HN wears me out a bit. I apologise for being a little aggressive in my previous comment.

j / k navigate · click thread line to collapse