A experiment was tried on a large and very intractable code-base of C++, Visual Basic, classic .asp, and SQL Server, with three different reporting systems attached to it. The reporting systems were crazy being controlled by giant XML files with complex namespaces and no-nos like the order of the nodes mattering. It had been maintained by offshore developers for maybe 10 years or more. The application was originally created over 25 years ago. They wanted to replace it with modern technology, but they estimated it'd take 7 years(!). So they just threw a team at it and said, "Just use prompts to AI and hand code minimally and see how far you get."
And they did wonderfully (and this is before the latest Claude improvements and agents) and they managed to create a minimal replacement in just two months (two or maybe three developers full time I think was the level of effort). This was touted at a meeting and given the approval for further development. At the meeting I specifically asked, "You only maintain this with prompts?" "Yes," they said, "we just iterate through repeated prompts to refine the code."
It has all mostly been abandoned a few months later. Parts of it are being reused, attempting a kind of "work in from the edges" approach to replacing parts of the system, but mostly it's dead.
We are yet to have a postmortem on this whole thing, but I've talked to the developers, and they essentially made a different intractable problem of repeated prompting breaking existing features when attempting to apply fixes or add features. And breaking in really subtle and hard to discern ways. The AI created unit tests didn't often find these bugs, either. They really tried a lot of angles trying to sort it out - complex .md files, breaking up the monolith to make the AI have less context to track, gross simplification of existing features, and so on. These are smarty-pants developers, too, people who know their stuff, got better than BS's, and they themselves were at first surprised at their success, then not so surprised later at the eventual result.
There was also a cost angle that became intractable. Coding like that was expensive. There was a lot of hand-wringing from managers over how much it was costing in "tokens" and whatever else. I pointed out if it's less cost than 7 years of development you're ahead of the game, which they pointed out it would be a cost spread over 7 years, not in 1 year. I'm not an accountant, but apparently that makes a difference.
I don't necessarily consider it a failed experiment, because we all learned a lot about how to better do our software development with AI. They swung for the fences but just got a double.
Of course this will all get better, but I wonder if it'll ever get there like we envision, with the Star Trek, "Computer, made me a sandwich," method of software development. The takeaway from all this is you still have to "know your code" for things that are non-trivial, and really, you can go a few steps above non-trivial. You can go a long way not looking to close at the LLM output, but there is a point at which it starts to be friction.
As a side note, not really related to the OP, but the UI cooked up by the LLMs was an interesting "card" looking kind of thing, actually pretty nice to look at and use. Then, when searching for a wiki for the Ball x Pit game, I noticed that some of the wikis very closely resembled the UI for the application. Now I see variations of it all over the internet. I wonder if the LLMs "converge" on a particular UI if not given specific instructions?