Leetcode (hard) from 0/45 (GPT-3.5) to 3/45 (GPT-4).
The lack of progress here, says a lot more about is NOT happening as an AI paradigm change. Still a glorified pattern matching and pattern creation engine, even if a very impressive one.
The difference I've noticed is the first shot is generally cleaner but the ceiling of what it can correct is limited. If it is given more independent or simple things to correct and it hears about it then you're usually golden but if that thing it has to correct interacts with other constraints then when it shifts approach to fix the issue it is told about it often forgets other things and can break them. Typically this happens on the more complex (as in how interrelated) problems, for complex (as in just a lot of stuff needs to be done) it does fine.