Today I was a few hours into chasing down a very tricky timing-dependent bug with GPT 5.5 and we were starting to go into circles. I noticed Opus 4.8 had showed up in GitHub Copilot so I switched over and pointed it at my notes so far. Another hour of steady progress and it tracked it down to some missing synchronisation in an upstream library which was occasionally corrupting a linked list. N=1 but worth every one of those rather expensive 15x requests today. 15x... yeah.
That's my initial experience, yes. It's hard to compare these things cleanly of course. I went through several new contexts on GPT and it just couldn't get traction -- it became hard to keep it focused on "yes there's clearly a race but what actual persistent state got broken"? It just wanted to change the thread priorities so that the problem didn't occur and kept doubling down on that as the solution. Opus made some missteps too but it responded well to my corrections - 2 or 3 significant ones along the way - and it was prepared to keep digging on my exact goal until it found the real issue.