Very late to the party, though one small observation:
(First up, my mind blown on how much more powerful gpt-4 is!)
GPT-4 seems to have outdone ChatGPT on all the tests, except the AMC 10, which it has regressed and did slightly worse than ChatGPT. But however it scored two times more on the AMC 12 which is actually a harder exam! Quite curious to know what could have caused its scores to be a little weird.
https://twitter.com/sudu_cb/status/1635888708963512320
For those not familiar the AMC 10 and 12 are the entry level math contests that feed into the main USA Math olympiad.