>I do thoroughly audit all the code that AI writes, and often go through multiple iterations
Does this actually save you time versus writing most of the code yourself? In general, it's a lot harder to read and grok code than to write it [0, 1, 2, 3]. For me, one of the biggest skills for using AI to efficiently write code is a) chunking the task into increments that are both small enough for me to easily grok the AI-generated code and also aligned enough to the AI's training data for its output to be ~100% correct, b) correctly predicting ahead of time whether reviewing/correcting the output for each increment will take longer than just doing it myself, and c) ensuring that the overhead of a) and b) doesn't exceed just doing it myself.
[0] https://mattrickard.com/its-hard-to-read-code-than-write-it
[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-...
[2] https://trishagee.com/presentations/reading_code/
[3] https://idiallo.com/blog/writing-code-is-easy-reading-is-har...
To be honest, I don’t really have a problem with chunking my tasks. The reason I don’t is because I don’t really think about it that way. I care a lot more about chunks and AI could reasonably validate. Instead of thinking “what’s the biggest chunk I could reasonably ask AI to solve” I think “what’s the biggest piece I could ask an AI to do that I can write a script to easily validate once it’s done?” Allowing the AI to validate its own work means you never have to worry about chunking again. (OK, that's a slight hyperbole, but the validation is most of my concern, and a secondary concern is that I try not to let it go for more than 1000 lines.)
For instance, take the example of an AI rewriting an API call to support a new db library you are migrating to. In this case, it’s easy to write a test case for the AI. Just run a bunch of cURLs on the existing endpoint that exercise the existing behavior (surely you already have these because you’re working in a code base that’s well tested, right? right?!?), and then make a script that verifies that the result of those cURLs has not changed. Now, instruct the AI to ensure it runs that script and doesn’t stop until the results are character for character identical. That will almost always get you something working.
Obviously the tactics change based on what you are working on. In frontend code, for example, I use a lot of Playwright. You get the idea.
As for code legibility, I tend to solve that by telling the AI to focus particularly on clean interfaces, and being OK with the internals of those interfaces be vibecoded and a little messy, so long as the external interface is crisp and well-tested. This is another very long discussion, and for the non-vibe-code-pilled (sorry), it probably sounds insane, and I feel it's easy to lose one's audience on such a polarizing topic, so I'll keep it brief. In short, one real key thing to understand about AI is that it makes the cost of writing unit tests and e2e tests drop significantly, and I find this (along with remaining disciplined and having crisp interfaces) to be an excellent tool in the fight against the increased code complexity that AI tools bring. So, in short, I deal with legibility by having a few really really clean interfaces/APIs that are extremely readable, and then testing them like crazy.
EDIT
There is a dead comment that I can't respond to that claims that I am not a reliable narrator because I have no A/B test. Behold, though: I am the AI-hater's nightmare, because I do have a good A/B test! I have a website that sees a decent amount of traffic (https://chipscompo.com/). Over the last few years, I have tried a few times to modernize and redesign the website, but these attempts have always failed because the website is pretty big (~50k loc) and I haven't been able to fit it in a single week of PTO.
This Thanksgiving, I took another crack at it with Claude Code, and not only did I finish an entire redesign (basically touched every line of frontend code), but I also got in a bunch of other new features, too, like a forgot password feature, and a suite of moderation tools. I then IaC'd the whole thing with Terraform, something I only dreamed about doing before AI! Then I bumped React a few majors versions, bumped TS about 10 years, etc, all with the help of AI. The new site is live and everyone seems to like it (well, they haven't left yet...).
If anything, this is actually an unfair comparison, because it was more work for the AI than it was for me when I tried a few years ago, because because my dependencies became more and more out of date as the years went on! This was actually a pain for AI, but I eventually managed to solve it.
The combination of which, deep training dataset + maps well to how AI "understands" code, it can be a real enabler. I've done it myself. All I've done with some projects is write tests, point Claude at the tests and ask it to write code till those tests pass, then audit said code, make adjustments as required, and ship.
That has worked well and sped up development of straightforward (sometimes I'd argue trivial) situations.
Where it falls down is complex problem sets, major refactors that cross cut multiple interdependent pieces of code, its less robust with less popular languages (we have a particular set of business logic in Rust due to its sensitive nature and need for speed, it does a not great job with that) and a host of other areas I have hit limitations with it.
Granted, I work in a fairly specialized way and deal with alot of business logic / rules rather than boiler plate CRUD, but I have hit walls on things like massive refactors in large codebases (50K is small to me, for reference)
But yes, I do think that the efficiency gain, purely in the domain of coding, is around 5x, which is why I was able to entirely redesign my website in a week. When working on personal projects I don't need to worry about stakeholders at all.
All of the work you described is essentially manual labor. It's not difficult work - just boring, sometimes error prone work that mostly requires you to do obvious things and then tackle errors as they pop up in very obvious ways. Great use case for AI, for sure. This and the fact that the end result is so poor isn't really selling your argument very well, except maybe in the sense that yeah, AI is great for dull work in the same way an excavator is great for digging ditches.
If you ever find yourself at the point where you are insulting a guy's passion project in order to prove a point, perhaps have a deep breath, and take a step back from the computer for a moment. And maybe you should look deep inside yourself, because you might have crossed the threshold to being a jerk.
Yes, my site has issues. You know what else it has? Users. Your comments about FOUC and waterfalls are correct, but they don't rank particularly high on what are important to people who used the site. I didn't instruct the AI to fix them, because I was busy fixing a bunch of real problems that my actual users cared about.
As for loading slowly -- it loads in 400ms on my machine.
And if it's producing an intern-level artifact for your frontend, what's to say it's not producing similar quality code for everything else? Especially considering frontend is often derided as being easier than other fields of software.
The site looks great to me. Your comment is actually offensive, despite you typing "no offence".
The METR paper demonstrated that you are not a reliable narrator for this. Have you participated in a study where this was measured, or are you just going off intuition? Because METR demonstrated beyond doubt that your intuition is a liar in this case.
If you're not taking measurements it is more likely that you are falling victim to a number of psychological effects (sunk cost, Gell-Manns, slot machine effect) than it is that your productivity has really improved.
Have you received a 5-10x pay increase? If your productivity is now 10x mine (I don't use these tools at work because they are a waste of time in my experience) then why aren't you compensated as such and if it's because of pointy haired bosses, you should be able to start a new company with your 10x productivity to shut him and me up.
Provide links to your evidence in the replies
The commenter told you they suspect they save time, it seems like taking their experience at face value is reasonable here. Or, at least I have no reason to jump down their throat... the same way I don't jump down your throat when you say, "these tools are a waste of time in my experience." I assume that you're smart enough to have tested them out thoroughly, and I give you the benefit of the doubt.
If you want to bring up METR to show that they might be falling into the same trap, that's fine, but you can do that in a much less caustic way.
But by the way, METR also used Cursor Pro and Claude 3.5/3.7 Sonnet. Cursor had smaller context windows than today's toys and 3.7 Sonnet is no longer state of the art, so I'm not convinced the paper's conclusions are still as valid today. The latest Codex models are exponential leaps ahead of what METR tested, by even their own research.[1]
[1]https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
Does Amazon pay everyone who receives "Not meeting expectations" in their perf review 0 dollars? Did Meta pay John Carmack (or insert your favorite engineer here) 100x that of a normal engineer? Why do you think that would be?
Imagine that you've built an app with libraries A, B, and C and conceptually understand all that's involved. But now you're required to move everything to X, Y, and Z. There won't be anything fundamentally new or revolutionary to learn, but you'll have to sit and read those docs, potentially for hours (cost of task switching and all). Getting the AI to execute the changes gets you to skip much of the tedium. And even though you still don't really know much about the new libs, you'll get the gist of most of the produced code. You can piecemeal the docs to review the code at sensitive boundaries. And for the rest, you'll paint inside the frames as you normally would if you were joining a new project.
Even as a skeptic of the general AI productivity narrative, I can see how that could squeeze a week's worth of "ever postponed" tasks inside a day.
I-know-what-kind-of-man-you-are.jpeg
You come off as a zealot by branding people who disagree as "haters".
Edit: AI excels at following examples, or simple, testable tasks that require persistence, which is intern-level work. Doing this narrow band of work quickly doesn't result in 10x productivity.
I'm yet to find a single person who has shown evidence to go through 10x more tasks in a sprint[1], or match the output of the rest of their 6-10-member team by themselves.
1. Even for junior level work
> I'm yet to find a single person who has shown evidence to go through 10x more tasks in a sprint[1], or match the output of the rest of their 6-10-member team by themselves.
If my website, a real website with real users, doesn't qualify, then I'm not sure what would. A single person with evidence is right in front of you, but you seem to be denying the evidence of your own eyes.
You are stuck in a very low local maximum.
You are me six months ago. You don’t know how it works, so you cannot yet reason about it. Unlike me, you’ve decided “all these other people who say it’s effective are making it up”. Instead ask, how does it work? What am I missing.