undefined | Better HN

0 pointsjwpapi10h ago0 comments

That’s the same story I had.

The swindle goes like this, AI on a good codebase can build a lot of features, you think it’s faster it even seems safer and more accurate on times, especially in domains you don’t know everything about.

This goes in for a while whilst the codebase gets bigger and exploration takes longer and failure rate increases. You don’t want it to be true and try harder so you only stop after it practically became impossible to make any changes.

You look at the code again and there is so much code spaghetti is an understatement it’s the Chinese wall.

You start working…, and you realize what was going on

I deleted 75,000 of 140,000 lines of code and I honestly feel like the 3 months I went hard into agentic coding I wasted and I failed my users by building useless features increasing bugs, losing the mental model of my code and not finding the problems I didn’t know about the kind of hard decisions you only see when you in the code, the stuff that wanders in your mind for days

0 comments

dxdm9h ago

I find it interesting that this outcome is a surprise. I don't want this to sound smug, I'm genuinely curious what the initial expectations are and where they come from.

They seem to be different for LLMs, because would anyone be surprised if they handed summary feature descriptions to some random "developer" you've ever only met online, and got back an absolute dung pile of half-broken implementation?

For some reason, people seem to expect miracles from some machine that they would not expect of other humans, especially not ones with a proven penchant for rambling hallucinations every once in a while.

I'd like to know, ideally from people who've been there, why they think that is. Where does the trust come from?

throw1010107h ago

LLMs do deliver "miracles", in certain cases, if you've experienced it and have been blown away by their output (one shot functional app from a well manufactured prompt, new feature added flawlessly on a complicated existing codebase, etc.), it can be tempting to reajust your expectations and think this will work consistently and at a much larger scale.

They can assimilate 100s of thousands of tokens of context in few seconds/minutes and do exceptional pattern matching beyond what any human can do, that's a main factor in why it looks like "miracles" to us. When a model actually solves a long standing issue that was never addressed due to a lack of funding/time/knowledge, it does feel miraculous and when you are exposed to this a couple of times it's easy to give them more trust, just like you would trust someone who provided you a helping hand a couple of times more than at total stranger.

dxdm7h ago

Thanks, that makes sense.

I suppose it's difficult to account for the inconsistency of something able to perform up to standard (and fast!) at one time, but then lose the plot in subtle or not-so-subtle ways the next.

We're wired to see and treat this machine as a human and therefore are tempted to trust it as if it were a human who demonstrated proficiency. Then we're surprised when the machine fails to behave like one.

I have to say, I'm still flabbergasted by the willingness to check out completely and not even keep on top of, and a mental model of, what gets produced. But the mind is easily tempted into laziness, I presume, especially when the fun part of thinking gets outsourced, and only the less fun work of checking is left. At least that's what makes the difference for me between coding and reviewing. One is considerably more interesting than the other, much less similar than they should be, given that they both should require gaining a similar understanding of the code.

doginasuit9h ago

I've never relied on an LLM to build a large section of code but I can see why people might think it is worth a try. It is incredible for finding issues in the code that I write, arguably its best use-case. When I let it write a function on its own, it is often perfect and maybe even more concise and idiomatic than I would have been able to produce. It is natural to extrapolate and believe that whatever intelligence drives those results would also be able to handle much more.

It is surprising how bad it is at taking the lead given how effective it is with a much more limited prompt, particularly if you buy in to all the hype that it can take the place of human intelligence. It is capable of applying a incredible amount of knowledge while having virtually no real understanding of the problem.

manicennui2h ago

Receiving an absolute dung pile of half-broken implementation is honestly what I expect from most working software engineers. Now the step where they spend even a second thinking about what they are doing has been removed. My job as a principle engineer became doing most of the thinking for people and then providing the only worthwhile code reviews before LLMs became a thing. LLMs just made these people even less useful and my job became even more about reviewing their low quality work that I could have done in less time manually.

LLMs also don't solve the much bigger problem of most software engineers having no ability to work with others to clarify requests or offer alternatives. So now bad and/or misunderstood requests can be implemented faster.

jeltz8h ago

Probably same reason people expected outsourcing to the cheapest firm in India would work: wishful thinking. People wanted it to work and therefore deluded themselves.

Or really the same reason people fall for get rich quick schemes.

thunky8h ago

> You look at the code again and there is so much code spaghetti is an understatement it’s the Chinese wall.

I don't understand this. A large codebase should be a collection of small codebases, just like a large city is a collection of small cities. There is a map and you zoom into your local area and work within that scope. You don't need to know every detail of NYC to get a cup of coffee.

Its your responsibility to build a sane architecture that is maintainable. AI doesn't prevent you from doing that, and in fact it can help you do so if you hold the tool correctly.

wavemode6h ago

To use your analogy - there's a big difference between the streets of New York and the streets of Boston. In New York if you know you're on 96th and 3rd, then you automatically know how to get to 101st and 5th - it's just a grid. But not every town is like that - many require you to possess knowledge about specific streets and specific landmarks in order to navigate anywhere.

To speak more directly - every codebase has local reasoning and global reasoning. When looking at a single piece of code that's well-isolated, you can fully understand its behavior "locally" without knowing anything about any other part of the code. But when a piece of code is tightly coupled to many other parts of the codebase, you have to reason globally - you have to understand the whole system to even understand what that one piece of code is doing, because it has tendrils touching the whole system. That's typically what we call spaghetti code.

If you leave an AI to its own devices, it will happily "punch holes", and create shortcuts, through your architecture to implement a specific feature, not caring about what that does to the comprehensibility of the system.

gf0007h ago

I don't think that's a useful mental model for software in general.

There are software that works like this (e.g. a website's unrelated pages and their logic), but in general composing simple functions can result in vastly non-proportional complexity. (The usual example is having a simple loop, and a simple conditional, where you can easily encode Goldbach or Collatz)

E.g. you write a runtime with a garbage collector and a JIT compiler. What is your map? You can't really zoom in on the district for the GC, because on every other street there you have a portal opening to another street on the JIT district, which have portals to the ISS where you don't even have gravity.

And if you think this might be a contrived example and not everyone is writing JIT-ted runtimes, something like a banking app with special logging requirements (cross cutting concerns) sits somewhere between these two extremes.

bluGill6h ago

The GC shouldn't care about all the code it is collecting. I collects garbage, it doesn't care if the garbage is a intermediate value from your tax calculations, or the the previous state image from your UI - either way it is garbage and it is gone. Now in a few cases details of garbage collection matter by enough that it is worth something more invasive for some reason, but the vast majority of code shouldn't care about the other areas.

When on a tiny project it doesn't matter. However when you have millions of lines of code you have to trust that your code works in isolation without knowing the details.

1 more reply

marcosdumay4h ago

> A large codebase should be a collection of small codebases, just like a large city is a collection of small cities.

Oh, great analogy there.

Just like there's almost nothing in common between a large city and a collection of small cities, a large codebase is completely different from a collection of small codebases too.

Mostly because of the same kinds of effects.

OptionOfT7h ago

No but the speed up of AI is giving up control, and then you notice these issues too late.

gonational5h ago

It's not that you can't "build a sane architecture" as much as it is difficult to justify the time spent to do this when you can "bang out features" in 10 minutes that would take days to do manually. It's about the economics of code generation. When inventing structure and typing it out as code takes time, thinking deeply about architecture first makes sense. There is another factor, as well: "thinking deeply about the architecture" involves experimentation. You might go down a particular path while coding, and then realize some limitations and/or new ideas, etc. You ultimately craft something that will work well and play well with future code, and which may be easily understood. If somebody stops by your desk and says, "you finish that <3 day feature you were just assigned 2 hours ago> yet?", that'll be the last time you think deeply about anything at work.

Rather than arguing about the specifics, it's easier to point to numerous concrete examples, such as a fairly simple system - which should be easy to implement in 8-15k lines of code, depending on certain choices (I've been writing code long enough to estimate this relatively accurately) - being still-incomplete while approaching 150k lines. These kinds of atrocities are usually economically infeasible in hand-written code, for 2 reasons: 1) the cost to produce that much code is very high, and 2) the cost of maintaining that much code is insurmountable.

I guess you could say that AI is great at generating code that only AI can understand and maintain.

mjburgess10h ago

I think this is true, but i imagine there's a workflow solution to this which isnt to drop AI.

Eg., treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc. And integrating in a more manual workflow.

There's a range from single-shot prompts to inline code generation, that will make more sense depending on the problem and where in the code base it is.

Single-shot stuff is going to make more sense for a protyping phase with extensive spec iteration. Once that prototype is in place, you then prob want to drop down into per-module/per-file generation, and be more systematic -- always maintaining a reasonably good mental model at this layer.

herrherrmann9h ago

That workflow just sounds exhausting to me. Would I always need to consider how much of a blast radius my AI-generated code might have? Sounds like there’s so much extra management going into these micro decisions that it ultimately defeats the purpose of generating code altogether.

I could see value in using it during the prototyping phase, but wouldn’t like to work like you described for a serious project for end users.

meetingthrower9h ago

And you have discovered the job of managers! There has always been a lot of hate for managers. Wonder if the robots hate us just as much? (I often feel a weird guilt when I tell an agent to do something I know I am going to throw away but will serve as an interesting exploration...I know if I did that to a human they would be pissed...)

2 more replies

embedding-shape9h ago

I just don't like to type code anymore. If I can accomplish the same by describing the code, and get the same results as if I typed it myself, I'll opt for not typing so damn much. I've done so much typing in my career, that typing ~80% less to get the same results, makes a pretty big difference in how likely I am to set out to accomplish something.

I care more about code quality now, because typing no longer limits if I feel like it's worth to refactor something or not.

1 more reply

timacles3h ago

This is like getting a person addicted to drugs and then asking them to only use the drugs on Thursday and Friday.

This seems to me like it requires an impossible level of discipline, judgement and foresight

anal_reactor9h ago

> treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc.

This is good advice regardless whether you're using AI or not, yet in real life "let's have well-defined boundaries and interfaces" always loses against "let's keep having meetings for years and then ducttape whatever works once the situation gets urgent".

chorsestudios9h ago

Were you auto-committing everything without reading the generated code? and if you read it but didn't understand it why not just ask for detailed comments for each output? Knowing that a larger codebase causes it to struggle means the output needs to be increasingly scrutinized as it becomes more complex.

skydhash9h ago

I don’t think it’s about what the code does. I think it’s more about how the code fits in its whole context. How useful it is in solving the overarching problem (of the whole software). How well does it follow the paradigm of the platform and the codebase.

You can have very good diffs and then found that the whole codebase is a collection of slightly disjointed parts.

gchamonlive9h ago

I haven't had the chance to work on large codebases, but isn't it possible to somehow adapt the workflow of Working Effectively with Legacy Code, building islands of higher quality code, using the AI to help reconstruct developer intention and business rules and building seams and unit tests for the target modules?

AI doesn't necessarily have to increase your throughput, it can also serve as a flexible exploration and refactoring tool that will support either later hand crafter code or agentic implementation.

qudat8h ago

Yep. I’m approaching the same problem from a different angle: writing code fast means you aren’t being thoughtful about the features you’re building. I started realizing that after I had kids and spent more time thinking about code than writing it and it really improved the quality of my work: https://bower.sh/thinking-slow-writing-fast

jwpapiOP6h ago

Obviously this is all my fault and others might have a better jugdement when using it. I’m just sharing my experience compared to the promises you easily might believe reading X/HN/Anthropic.

I still have a lot of usage for AI: Exploration, Double-checking me, teaching me. But writing code became very tough for me to accept. Nex-edit autocompletes mainly

scruple6h ago

> I still have a lot of usage for AI: Exploration, Double-checking me, teaching me.

I'm ready to give up on having it even review my code at this point. It's been so frustrating. It hallucinates bugs, especially in places where "best practices" are at odds with reality.

Recently it informed me of a bug where it suggested the line of code in question couldn't possibly do anything because on Linux the specific stdlib behaved in X ways, but it was obvious from the line of code that it was running on Windows which doesn't have this problem at all. Of course, it doesn't actually mention that this is an issue on Linux, just that there is a bug here. It vomits up a paragraph of $WORDS explaining why this was a high-priority bug that absolutely needed to be fixed because it was failing in subtle ways. Yet the line of code in question has been running in production, producing exactly the results it is expected to, for ~3 years.

And this is just one simple example, of the many dozens+ of times it has failed this task this year. In that same review run, the agent suggested 3 additional "bugs" or other issues that should be addressed that were all flatly wrong or subjective. I'm at a point of absolute exhaustion with this sort of shit. It's worse than a junior half of the time because of how strongly opinionated it is. And the solution to this sort of problem is an endless amount of configuration and customization that will be forgotten about by all of us over time, leading to who knows what sort of knock-on effects (especially as we migrate from one model to the next). We have a guy on our team who has ~17,000 words in his agent and instructions files, yet he sees nothing wrong with this. I guess he just really loves YAML and Markdown.

deadbabe9h ago

I don’t think you truly captured the worst part:

There comes a realization, to many engineer’s horror, that AI won’t be able to save them and they will have to manually comprehend and possibly write a ton of code by hand to fix major issues, all while upper management is breathing down their back furious as to why the product has become a piece of shit and customers are leaving to competitors.

The engineers who sink further into denial thrash around with AI, hoping they are a few prompts or orchestrations away from everything be fixed again.

But the solution doesn’t come. They realize there is nothing they can do. It’s over.

j / k navigate · click thread line to collapse

0 comments

dxdm9h ago

I find it interesting that this outcome is a surprise. I don't want this to sound smug, I'm genuinely curious what the initial expectations are and where they come from.

I'd like to know, ideally from people who've been there, why they think that is. Where does the trust come from?

throw1010107h ago

dxdm7h ago

Thanks, that makes sense.

I suppose it's difficult to account for the inconsistency of something able to perform up to standard (and fast!) at one time, but then lose the plot in subtle or not-so-subtle ways the next.

doginasuit9h ago

manicennui2h ago

jeltz8h ago

Probably same reason people expected outsourcing to the cheapest firm in India would work: wishful thinking. People wanted it to work and therefore deluded themselves.

Or really the same reason people fall for get rich quick schemes.

thunky8h ago

> You look at the code again and there is so much code spaghetti is an understatement it’s the Chinese wall.

Its your responsibility to build a sane architecture that is maintainable. AI doesn't prevent you from doing that, and in fact it can help you do so if you hold the tool correctly.

wavemode6h ago

gf0007h ago

I don't think that's a useful mental model for software in general.

bluGill6h ago

When on a tiny project it doesn't matter. However when you have millions of lines of code you have to trust that your code works in isolation without knowing the details.

1 more reply

marcosdumay4h ago

> A large codebase should be a collection of small codebases, just like a large city is a collection of small cities.

Oh, great analogy there.

Just like there's almost nothing in common between a large city and a collection of small cities, a large codebase is completely different from a collection of small codebases too.

Mostly because of the same kinds of effects.

OptionOfT7h ago

No but the speed up of AI is giving up control, and then you notice these issues too late.

gonational5h ago

I guess you could say that AI is great at generating code that only AI can understand and maintain.

mjburgess10h ago

I think this is true, but i imagine there's a workflow solution to this which isnt to drop AI.

Eg., treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc. And integrating in a more manual workflow.

There's a range from single-shot prompts to inline code generation, that will make more sense depending on the problem and where in the code base it is.

herrherrmann9h ago

I could see value in using it during the prototyping phase, but wouldn’t like to work like you described for a serious project for end users.

meetingthrower9h ago

2 more replies

embedding-shape9h ago

I care more about code quality now, because typing no longer limits if I feel like it's worth to refactor something or not.

1 more reply

timacles3h ago

This is like getting a person addicted to drugs and then asking them to only use the drugs on Thursday and Friday.

This seems to me like it requires an impossible level of discipline, judgement and foresight

anal_reactor9h ago

> treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc.

chorsestudios9h ago

skydhash9h ago

You can have very good diffs and then found that the whole codebase is a collection of slightly disjointed parts.

gchamonlive9h ago

AI doesn't necessarily have to increase your throughput, it can also serve as a flexible exploration and refactoring tool that will support either later hand crafter code or agentic implementation.

qudat8h ago

jwpapiOP6h ago

Obviously this is all my fault and others might have a better jugdement when using it. I’m just sharing my experience compared to the promises you easily might believe reading X/HN/Anthropic.

I still have a lot of usage for AI: Exploration, Double-checking me, teaching me. But writing code became very tough for me to accept. Nex-edit autocompletes mainly

scruple6h ago

> I still have a lot of usage for AI: Exploration, Double-checking me, teaching me.

I'm ready to give up on having it even review my code at this point. It's been so frustrating. It hallucinates bugs, especially in places where "best practices" are at odds with reality.

deadbabe9h ago

I don’t think you truly captured the worst part:

The engineers who sink further into denial thrash around with AI, hoping they are a few prompts or orchestrations away from everything be fixed again.

But the solution doesn’t come. They realize there is nothing they can do. It’s over.

j / k navigate · click thread line to collapse