I don't think there is a single reason. Models are improving, so are the harnesses, prompts and we who use them a lot also get more proficient and learn where they can be used effectively vs not, so lots of improvements all over the ecosystem, brought together.
Latest big change is probably how feasible local models are becoming, like Qwen 3.6 and Gemma 4, they're no longer easily getting stuck in loops and repetition, although on lower quantizations they still pretty much suck for agentic usage.