In my org we are experimenting with agentic flows, and we've noticed that model choice matters especially for autonomy.
GPT-5.2 performed much better for long-running tasks. It stayed focused, followed instructions, and completed work more reliably.
Opus 4.5 tended to stop earlier and take shortcuts to hand control back sooner.