undefined | Better HN

0 pointspeepeepoopoo971y ago0 comments

O3 is multiple orders of magnitude more expensive to realize a marginal performance gain. You could hire 50 full time PhDs for the cost of using O3. You're witnessing the blowoff top of the scaling hype bubble.

0 comments

whynotminot1y ago

What they’ve proven here is that it can be done.

Now they just have to make it cheap.

Tell me, what has this industry been good at since its birth? Driving down the cost of compute and making things more efficient.

Are you seriously going to assume that won’t happen here?

YeGoblynQueenne1y ago

>> Now they just have to make it cheap.

Like they've been making it all this time? Cheaper and cheaper? Less data, less compute, fewer parameters, but the same, or improved performance? Not what we can observe.

>> Tell me, what has this industry been good at since its birth? Driving down the cost of compute and making things more efficient.

No, actually the cheaper compute gets the more of it they need to use or their progress stalls.

whynotminot1y ago

> Like they've been making it all this time?

Yes exactly like they’ve been doing this whole time, with the cost of running each model massively dropping sometimes even rapidly after release.

1 more reply

Jensson1y ago

> What they’ve proven here is that it can be done.

No they haven't, these results do not generalize, as mentioned in the article:

"Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute"

Meaning, they haven't solved AGI, and the task itself do not represent programming well, these model do not perform that well on engineering benchmarks.

whynotminot1y ago

Sure, AGI hasn’t been solved today.

But what they’ve done is show that progress isn’t slowing down. In fact, it looks like things are accelerating.

So sure, we’ll be splitting hairs for a while about when we reach AGI. But the point is that just yesterday people were still talking about a plateau.

1 more reply

peepeepoopoo97OP1y ago

Yes, that's exactly what I'm implying, otherwise they would have done it a long time ago, given that the fundamental transformer architecture hasn't changed since 2017. This bubble is like watching first year CS students trying to brute force homework problems.

whynotminot1y ago

> Yes, that's exactly what I'm implying, otherwise they would have done it a long time ago

They’ve been doing it literally this entire time. O3-mini according to the charts they’ve released is less expensive than o1 but performs better.

Costs have been falling to run these models precipitously.

MVissers1y ago

I would agree if the cost of AI compute over performance hasn't been dropping by more than 90-99% per year since GPT3 launched.

This type of compute will be cheaper than Claude 3.5 within 2 years.

It's kinda nuts. Give these models tools to navigate and build on the internet and they'll be building companies and selling services.

fspeech1y ago

That's a very static view of the affairs. Once you have a master AI, at a minimum you can use it to train cheaper slightly less capable AIs. At the other end the master AI can train to become even smarter.

Bolwin1y ago

The high efficiency version got 75% at just $20/task. When you count the time to fill in the squares, that doesn't sound far off from what a skilled human would charge

j / k navigate · click thread line to collapse

0 comments

whynotminot1y ago

What they’ve proven here is that it can be done.

Now they just have to make it cheap.

Tell me, what has this industry been good at since its birth? Driving down the cost of compute and making things more efficient.

Are you seriously going to assume that won’t happen here?

YeGoblynQueenne1y ago

>> Now they just have to make it cheap.

Like they've been making it all this time? Cheaper and cheaper? Less data, less compute, fewer parameters, but the same, or improved performance? Not what we can observe.

>> Tell me, what has this industry been good at since its birth? Driving down the cost of compute and making things more efficient.

No, actually the cheaper compute gets the more of it they need to use or their progress stalls.

whynotminot1y ago

> Like they've been making it all this time?

Yes exactly like they’ve been doing this whole time, with the cost of running each model massively dropping sometimes even rapidly after release.

1 more reply

Jensson1y ago

> What they’ve proven here is that it can be done.

No they haven't, these results do not generalize, as mentioned in the article:

"Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute"

Meaning, they haven't solved AGI, and the task itself do not represent programming well, these model do not perform that well on engineering benchmarks.

whynotminot1y ago

Sure, AGI hasn’t been solved today.

But what they’ve done is show that progress isn’t slowing down. In fact, it looks like things are accelerating.

So sure, we’ll be splitting hairs for a while about when we reach AGI. But the point is that just yesterday people were still talking about a plateau.

1 more reply

peepeepoopoo97OP1y ago

whynotminot1y ago

> Yes, that's exactly what I'm implying, otherwise they would have done it a long time ago

They’ve been doing it literally this entire time. O3-mini according to the charts they’ve released is less expensive than o1 but performs better.

Costs have been falling to run these models precipitously.

MVissers1y ago

I would agree if the cost of AI compute over performance hasn't been dropping by more than 90-99% per year since GPT3 launched.

This type of compute will be cheaper than Claude 3.5 within 2 years.

It's kinda nuts. Give these models tools to navigate and build on the internet and they'll be building companies and selling services.

fspeech1y ago

Bolwin1y ago

The high efficiency version got 75% at just $20/task. When you count the time to fill in the squares, that doesn't sound far off from what a skilled human would charge

j / k navigate · click thread line to collapse