I have scripted prompts for long duration automated coding workflows of the fire and forget, issue description -> pull request variety. Sonnet 4 does better than you’d expect: it generates high quality mergable code about half the time. Sonnet 4.5 fails literally every time.
- slower compared to other models that will also do the job just fine (but excels at more complex tasks),
- it's very insistent on creating loads of .MD files with overly verbose documentation on what it just did (not really what I ask it to do),
- it actually deleted a file twice and went "oops, I accidentaly deleted the file, let me see if I can restore it!", I haven't seen this happen with any other agent. The task wasn't even remotely about removing anything
And yes, I have hooks to disable 'git reset', 'git checkout', etc., and warn the model not to use these commands and why. So it writes them to a bash script and calls that to circumvent the hook, successfully shooting itself in the foot.
Sonnet 4.5 will not follow directions. Because of this, you can't prevent it like you could with earlier models from doing something that destroys the worktree state. For longer-running tasks the probability of it doing this at some point approaches 100%.
Every model from every provider at every version I've used has intermingled brilliant perfect instruction-following and weird mistaken divergence.
In this case I can't get 4.5 to follow directions. Neither can anyone else, aparantly. Search for "Sonnet 4.5 follow instructions" and you'll find plenty of examples. The current top 2:
https://www.reddit.com/r/ClaudeCode/comments/1nu1o17/45_47_5...
https://theagentarchitect.substack.com/p/claude-sonnet-4-pro...