undefined | Better HN

0 pointsnetvarun3y ago0 comments

Very late to the party, though one small observation: (First up, my mind blown on how much more powerful gpt-4 is!) GPT-4 seems to have outdone ChatGPT on all the tests, except the AMC 10, which it has regressed and did slightly worse than ChatGPT. But however it scored two times more on the AMC 12 which is actually a harder exam! Quite curious to know what could have caused its scores to be a little weird. https://twitter.com/sudu_cb/status/1635888708963512320 For those not familiar the AMC 10 and 12 are the entry level math contests that feed into the main USA Math olympiad.

0 comments

gowld3y ago

> But however it scored two times more on the AMC 12

No it didn't; that's not how the scoring scale works. It scored higher, but not "2 times".

https://news.ycombinator.com/item?id=35156404

netvarunOP3y ago

Yes I'm aware of it. I meant it more in absolute terms as a reference (60 is 2 times more than 30 no? ;) ) to make the point that the AMC 12 scores are way better than the AMC 10 scores. Nevertheless the bigger point is that there seems to be some anomaly in the test scores. Maybe some data contamination or some bug in their automated test suite. And on twitter quite a few folks also mentioned this, including a former OpenAI engineer[0] who worked on automated theorem proving. I'm pretty sure this will be looked into further in the coming weeks.

[0] https://twitter.com/spolu/status/1635903343397576705

j / k navigate · click thread line to collapse