4.20 with its 4 agents puts it back at the top for reasoning as well. As soon as it's added to the API, the benchmarks should show that.