The big coding model moments in recent recollection, IMO, were something like:
- Sonnet 3.5 update in October 2024: ability to generate actually-working code using context from a codebase became genuinely feasible.
- Claude 4 release in May 2025: big tool calling improvements meant that agentic editors like Claude Code could operate on a noticeably longer leash without falling apart.
- Gemini 3 Pro, Claude 4.5, GPT 5.2 in Nov/Dec 2025: with some caveats these were a pretty major jump in the difficulty and scale of tasks that coding assistants are able to handle, working on much more complex projects over longer time scales without supervision, and testing their own work effectively.
I think a lot of the noise about letting Claude run for very extended periods involves relatively greenfield projects where the AI is going to be using tools and patterns and choices that are heavily represented in training data (unless you tell it not to), which I think are more likely to result in a codebase that lends itself to ongoing AI work. People also just exaggerate and talk about the one time doing that actually worked vs the 37 times Claude required more handholding.
The bigger problem I see with the "leave it running for the weekend" type work is that, even if it doesn't get caught up on something trivial like tabs vs spaces (glad we're keeping that one alive in the AI era, lol), it will accumulate bad decisions about project structure/architecture/design that become really annoying to untie, and that amount to a flavor of technical debt that makes it harder for agents themselves to continue to make forward progress. Lots of insidious little things: creating giant files that eventually create context problems, duplicating important methods willy nilly and modifying them independently so their implementations drift apart, writing tests that are..."designed to pass" in a way that creates a false sense of confidence when they're passing, and "forest for the trees" kind of issues where the AI gets the logic right inside a crucial method so it looks good at a glance, but it misses some kind of bigger picture flaw in the way the rest of the code actually uses that method.
1: https://marginlab.ai/trackers/claude-code-historical-perform...