I agree that AI is inevitable. But there’s such a level of groupthink about it at the moment that everything is manifested as an agentic text box. I’m looking forward to discovering what comes after everyone moves on from that.
That is what I find so wild about the current conversation and debate. I have claude code toiling away building my personal organization software right now that uses LLMs to take unstructured input and create my personal plans/project/tasks/etc.
When someone uses an agent to increase their productivity by 10x in a real, production codebase that people actually get paid to work on, that will start to validate the hype. I don’t think we’ve seen any evidence of it, in fact we’ve seen the opposite.
The types of tasks I have been putting Claude Code to work on are iterative changes on a medium complexity code base. I have an extensive Claude.md. I write detailed PRDs. I use planning mode to plan the implementation with Claude. After a bunch of iteration I end up with nicely detailed checklists that take quite a lot of time to develop but look like a decent plan for implementation. I turn Claude (Opus) loose and religiously babysit it as it goes through the implementation.
Less than 50% of the time I end up with something that compiles. Despite spending hundreds of thousands of tokens while Claude desperately throws stuff against the wall trying to make it work.
I end up spending as much time as it would have taken just to write it to get through this process AND then do a meticulous line by line review where I typically find quite a lot to fix. I really can't form a strong opinion about the efficiency of this whole thing. It's possible this is faster. It's possible that it's not. It's definitely very high variance.
I am getting better at pattern matching on things AI will do competently. But it's not a long list and it's not much of the work I actually do in a day. Really the biggest benefit is that I end up with better documentation because I generated all of that to try and make the whole thing actually work in the first place.
Either I am doing something wrong, the work that AI excels at looks very different than mine, or people are just lying.
Working with production code is basically jumping straight to the ball of mud phase, maybe somewhat less tangled but usually a much much larger codebase. Its very hard to describe to an LLM what to even do since you have such a complex web of interactions to consider in most mature production code.
It is really the same kind of thing.. but the model is "smarter" then a junior engineer usually. You can say something like "hmm.. I think an event bus makes sense here" Then the LLM will do it in 5 seconds. The problem is that there are certain behavioral biases that require active reminding (though I think some MCP integration work might resolve most of them, but this is just based on the current Claude Code and Opus/Sonnet 4 models)
We saw the same thing with blockchain. We started seeing the most ridiculous attempts to integrate blockchain, by companies where it didn't even make any sense. But it was all because doing so excited investors and boosted stock prices and valuations, not because consumers wanted it.