undefined | Better HN

0 pointsredleggedfrog4mo ago0 comments

The future is already here. Been working a few years at a subsidiary of a large corporation where the entire hierarchy of companies is pushing AI hard, at different levels of complexity, from office work up through software development. Regular company meetings across companies and divisions to discuss methods and progress. Overall not a bad strategy and it's paying dividends.

A experiment was tried on a large and very intractable code-base of C++, Visual Basic, classic .asp, and SQL Server, with three different reporting systems attached to it. The reporting systems were crazy being controlled by giant XML files with complex namespaces and no-nos like the order of the nodes mattering. It had been maintained by offshore developers for maybe 10 years or more. The application was originally created over 25 years ago. They wanted to replace it with modern technology, but they estimated it'd take 7 years(!). So they just threw a team at it and said, "Just use prompts to AI and hand code minimally and see how far you get."

And they did wonderfully (and this is before the latest Claude improvements and agents) and they managed to create a minimal replacement in just two months (two or maybe three developers full time I think was the level of effort). This was touted at a meeting and given the approval for further development. At the meeting I specifically asked, "You only maintain this with prompts?" "Yes," they said, "we just iterate through repeated prompts to refine the code."

It has all mostly been abandoned a few months later. Parts of it are being reused, attempting a kind of "work in from the edges" approach to replacing parts of the system, but mostly it's dead.

We are yet to have a postmortem on this whole thing, but I've talked to the developers, and they essentially made a different intractable problem of repeated prompting breaking existing features when attempting to apply fixes or add features. And breaking in really subtle and hard to discern ways. The AI created unit tests didn't often find these bugs, either. They really tried a lot of angles trying to sort it out - complex .md files, breaking up the monolith to make the AI have less context to track, gross simplification of existing features, and so on. These are smarty-pants developers, too, people who know their stuff, got better than BS's, and they themselves were at first surprised at their success, then not so surprised later at the eventual result.

There was also a cost angle that became intractable. Coding like that was expensive. There was a lot of hand-wringing from managers over how much it was costing in "tokens" and whatever else. I pointed out if it's less cost than 7 years of development you're ahead of the game, which they pointed out it would be a cost spread over 7 years, not in 1 year. I'm not an accountant, but apparently that makes a difference.

I don't necessarily consider it a failed experiment, because we all learned a lot about how to better do our software development with AI. They swung for the fences but just got a double.

Of course this will all get better, but I wonder if it'll ever get there like we envision, with the Star Trek, "Computer, made me a sandwich," method of software development. The takeaway from all this is you still have to "know your code" for things that are non-trivial, and really, you can go a few steps above non-trivial. You can go a long way not looking to close at the LLM output, but there is a point at which it starts to be friction.

As a side note, not really related to the OP, but the UI cooked up by the LLMs was an interesting "card" looking kind of thing, actually pretty nice to look at and use. Then, when searching for a wiki for the Ball x Pit game, I noticed that some of the wikis very closely resembled the UI for the application. Now I see variations of it all over the internet. I wonder if the LLMs "converge" on a particular UI if not given specific instructions?

0 comments

4 comments · 4 top-level

pragmatic4mo ago

These are the blog posts we need.

This is the siren song of llm. "Look how much progress we made"

Effort increases as time to completion decreases. The last 10% of the project takes 90% of the effort as you try to finish up, deploy,integrate and find the gaps.

Llms are woefully incapable of that as that knowledge doesn't exist in a markdown file. It's in people's heads and you have to pry it out with a crowbar or as happens to so many projects, they get released and no one uses it.

See Google et Al. "We failed to find market fit on the 15th iteration of our chat app, we'll do better next time"

1 more reply

nottorp4mo ago

I've noticed this in my small scale tests. Basically the larger the prompt gets (and it includes all the previously generated code because that's what you want to add features to), the more likely is that the LLM will go off the rails. Or forget the beginning of the context. Or go into a loop.

Now if you're using a lot of separate prompts where you draw from whatever the network was trained on and not from code that's in the prompt, you can get usable stuff out of it. But that won't build you the whole application.

kaydub4mo ago

> I wonder if the LLMs "converge" on a particular UI if not given specific instructions?

Purple. They really fucking like this purple gradient background for some reason lol.

sonofhans4mo ago

In a veritable ocean of opinions it is excellent to see a detailed, first-hand report. Many thanks!

j / k navigate · click thread line to collapse