We do this every day. I'm sorry to say, we are indeed shipping in days what used to take weeks.
I do systems programming. Before AI feature development roughly went like, design, implement, test, review with some back edges and a lot of time spent in test and review.
AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing, though still an improvement overall.
We do not see the weeks to days improvement though. The bottleneck before was testing and reviewing, and they are even bigger bottlenecks now.
What kind of work do you do, and what kind of workflow were you using before and after AI to benefit so much?
I'll stop you right there. AI is not good at systems programming, it's good at CRUD web development, which is where most people are seeing the gains.
AI has solved simple CRUD, yes, but CRUD, was easy before.
Now there may be an additional corner case or 20 where its still valid but they are not your typical software engineering work.
I also have your experience, even 100x code delivery improvement would barely move the needle of project delivery in our place. Better, more automated integration and end-to-end functional tests which reflect real world usage/data flows would actually make much bigger difference, no reason to think llms couldn't deliver this in near future.
For things like web frontents/backends, though, it works beautifully. I ship things in days that would take me weeks to write by hand, and I'm very fast at writing things by hand. The AI also ships many fewer bugs than our average senior programmer, though maybe not fewer bugs than our staff programmers.
The boost is for what are glorified crud apps which it 1000x the tedious work. However, the choices it makes along the way quickly blows up without cleaning. Seniors know how to keep their workstation clean or they should.
Maybe they're using AI for testing and reviewing more than you are, not just for coding?
In my experience, the generated code handles the happy path, but isn't great about edge cases or writing clean code, even with explicit instruction in the initial prompt.
We usually end up doing multiple iterations with what claude/codex output, pointing out issues, asking for changes, etc.
Maybe they're using AI for testing and reviewing more than you are?
It's glue, custom business workflows, and basic web CRUD stuff. We build almost everything on Rails unless there's a critical reason not to (e.g., maintaining an existing system versus building from scratch.)
With very few exceptions our team composition is one senior engineer paired to a business. So we get to avoid a large amount of SDLC busywork which is inter-team communication. This leaves more time for client<->engineer communication which has a host of additional benefits. We also build with a "North Star" methodology which keeps everyone, including the client, laser focused on the work at hand.
To answer your final question about how we're benefiting so much from AI, I think it's primarily that we're leaning into it for both implementation, testing, and review. I know it's a sin to let AI review AI, but... it works. I'm actively skeptical of it myself, but our error rate and rework rates don't lie.
And we've got clients in various stages of development and/or long-term support. It's not like we're just hammering a bunch of stuff out and then bouncing. Most of these are multi-year tightly-integrated projects with our clients and we don't see a lack of trust or frustration that you'd expect to see if you were shipping slop. Our Honeybadger errors typically stay at zero, our performance metrics are acceptable across the board, and most importantly our clients love the work we're doing.
I can't think of any other way to measure the quality of what we're doing. And by those metrics, AI has made us better, not worse.
I should write a blog post to outline more of this in detail.
I have an example in my line of work. Full service rewrite in a new language. Would have taken forever without AI. AI makes it easier, faster. The service has better throughput, uses less machines. Having a complete full test harness that allows us to ensure we are meeting all the functionality of the previous service is key. AND we are keeping the old service on standby because we know we don't know what might be wrong with the new one.
What's your example?
> Our projects are closed source due to our clients owning the code, but I can offer anecdote. We have a client whose business operates on 2-3 very niche SaaS applications in the veterinary/animal medicine space. In a span of about 6 months, we completely ripped out 2 of those 3 and are working on replacing the 3rd one right now. We've done this with a single senior engineer working with the client between 20-40 hours per week with no major regressions. The business has been able to continue working as usual with no disruptions throughout this process.
> Obviously it's hard to measure this objectively, but I can't imagine having done this pre-AI with zero downtime and having replaced those SaaS applications in that timeframe.
I worry we haven't had to maintain vibecoded applications much and have no idea how difficult they will be to debug (or not).
The difference between it's working now and it will continue working in two years is exactly the problem with AI-generated code because the tests can't tell you that, and you don't know which one you have if you don't look really carefully.
Obviously it's hard to measure this objectively, but I can't imagine having done this pre-AI with zero downtime and having replaced those SaaS applications in that timeframe.
(Not the exact same chart but similar idea, I guess it's sort of a meme: https://imgur.com/a/YrNGYOR)
So I looked at the most recent CC release notes on Github and the majority look like this:
Fixed /clear not resetting the terminal tab title after a conversation
Fixed session title chip from /rename disappearing while a permission or other dialog is active
Fixed agent panel below the prompt being hidden when subagents are running (regression in 2.1.122)
Fixed external-editor handoff (Ctrl+G) blanking the conversation history above the prompt
Fixed /context dumping its rendered ASCII visualization grid into the conversation, wasting ~1.6k tokens per call
Fixed OAuth refresh race after wake-from-sleep that could log out all running sessions
Fixed 1-hour prompt cache TTL being silently downgraded to 5 minutes
Fixed cache-miss warning appearing spuriously after /clear or compaction when changing /effort or /model
I'd be extremely interested to know what percentage of these were just fixing last week's Claude Code written PR that no human ever set eyes on.But hey, all that churn looks great on charts being circulated on social media as free advertising for their flagship product (and consequently the company's valuation) so never mind, LGTM!
> We do this every day. I'm sorry to say, we are indeed shipping in days what used to take weeks.
I've been searching for months for evidence of this kinda thing. Do you have receipts you can share? Or is it more of the same "just trust me bro"?
Of course, it's not just shipping, it's shipping stably in a way that doesn't disrupt the day-to-day operations of the businesses we're working for. One client that comes to mind has 2-3 niche SaaS applications that they used independently for various workloads. We completely replaced 2 of those without any disruptions to their business in about 6 months (no, we did not replace it feature-for-feature; we just built what they needed.)