undefined | Better HN

0 pointsfreehorse1y ago0 comments

> I am interpreting this result as human level reasoning now costs (approximately) 41k/hr to 2.5M/hr with current compute.

On a very simple, toy task, which arc-agi basically is. Arc-agi tests are not hard per se, just LLM’s find them hard. We do not know how this scales for more complex, real world tasks.

0 comments

SamPatt1y ago

Right. Arc is meant to test the ability of a model to generalize. It's neat to see it succeed, but it's not yet a guarantee that it can generalize when given other tasks.

The other benchmarks are a good indication though.

lyu072821y ago

> Arc is meant to test the ability of a model to generalize. It's neat to see it succeed, but it's not yet a guarantee that it can generalize when given other tasks.

Well no, that would mean that Arc isn't actually testing the ability of a model to generalize then and we would need a better test. Considering it's by François Chollet, yep we need a better test.

criddell1y ago

Does it mean anything for more general tasks like driving a car?

brookst1y ago

Is every smart person a good driver?

earth2mars1y ago

That kind of proves that point that no matter how smart it can get, it may still have several disabilities that are crucial and very naive for humans. Is it generalizing on any task or specific set of tasks.

zarzavat1y ago

Likely yes. Every smart person is capable of being a good driver, so long as you give them enough training and incentive. Zero smart people are born being able to drive.

2 more replies

j / k navigate · click thread line to collapse