undefined | Better HN

0 pointsImnimo3y ago0 comments

And, if I'm reading their calculation right, that's 85% on the medium-difficulty bucket, not even the entire HumanEval benchmark?

(quoting from the GPT-4 paper):

>All but the 15 hardest HumanEval problems were split into 6 difficulty buckets based on the performance of smaller models. The results on the 3rd easiest bucket are shown in Figure 2

0 comments

1 comments · 1 top-level

PoignardAzur3y ago

That does seem to support the idea that we're two or three major breakthroughs away from superintelligent AGI, assuming these scaling curves keep holding as they have.

j / k navigate · click thread line to collapse