undefined | Better HN

0 pointscausal12h ago0 comments

The issue is that ARC AGI 3 specifically forbids harnesses that humans get to use.

0 comments

So what? Are you suggesting that an agent exhibiting genuine AGI will be tripped up by having to ingest json rather than rgb pixels? LLMs are largely trained on textual data so json is going to be much closer to whatever native is for them.

But by all means, give the agents access to an API that returns pixel data. However I fully expect that would reduce performance rather than increase it.

pawelk4113h ago

Because it is. Opus 4.6 jumps from 0.0% to 97.1% when given visual input

fc417fc8023h ago

That's impressive. I'm also a bit surprised - I wouldn't have expected it to be trained much at all on that sort of visual input task. I think I'd be similarly surprised to learn that a frontier model was particularly good at playing retro videogames or actuating a robot for example.

However, if it can't figure out to render the json to a visual on its own does it really qualify as AGI? I'd still say the benchmark is doing its job here. Granted it's not a perfectly even playing field in that case but I think the goal is to test for progress towards AGI as opposed to hosting a fair tournament.

1 more reply

j / k navigate · click thread line to collapse