undefined | Better HN

0 pointsAurornis8d ago0 comments

Even using Fable (while it was briefly available), having it refine a plan, and directing it to make only small incremental changes, I still found reasons to reject its first pass at a lot of work. There was a lot of “You’re right to push back” responses. A lot of incidents where it would creat some giant complex set of abstractions to accomplish something that I could find ways to do much more elegantly and in a more maintainable manner.

It’s really eye opening to work with these tools on a codebase you know deeply because these problems are everywhere.

However if I opened an unfamiliar project in another language and I wanted to add a little feature with no intention of maintaining it, I’d happily accept the changes and loop until it worked well enough for my temporary needs.

The scary middle is when you’re dealing with coworkers who don’t care about anything other than closing tickets and collecting credit. With enough of a token budget you can now wrap loops around an LLM and have it try things until the program appears to work. Ask it to do a code review and then submit the PR without having understood what it was doing. There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.

0 comments

42 comments · 13 top-level

embedding-shape8d ago· 9 in thread

> There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.

If the "big ball of spaghetti" theory holds, where software companies who can't manage the debt stumble over themselves as they continue to add to the big ball of spaghetti code, I guess we'll see a row of companies declaring "software bankruptcy" or something in some/many months, depending on how well these workspaces learn to care slightly more and get better at pushing back against slop.

onion2k8d ago

I guess we'll see a row of companies declaring "software bankruptcy" or something in some/many months

I don't think you will, because that would require the business to recognise the problem. That might happen in companies where the leadership team are engineers but it will never happen if they're not.

Instead you'll see:

- Churn in the dev team with senior developers leaving rather than try to deal with the mess

- Large scale projects to refactor or rewrite entire codebases, which will inevitably fail because you can't rewrite a big ball of spaghetti because you can't tell what it actually does (especially if it's in a language that allows side effects, or you've used a strategy like 'exceptions as flow of control').

- Companies just getting slower and slower to deliver anything. That's probably fine in many cases where they're big enough to still carry on without growing much, but anyone in the company will see their career die and pay rises dry up.

- Eventually, maybe, you'll see 'tech debt fixing' service companies start up to leverage AI in the effort to fix these problems. (AWS have a thing called 'Amazon Modernization Lab' that is exactly that, but only for companies running old tech on their services.)

danaris8d ago

The problem is that this is just another instance of trusting that "the market will solve all our problems."

But that's based on "spherical economy in a frictionless vacuum" type assumptions.

In the real world, in addition to the problems others have noted of it being hard to identify and fix the specific sources of problems, we have so much consolidation that it doesn't matter if something from any of the tech giants starts getting buggier and slower. What are you* going to do—switch from Windows to Linux, just because it's getting a bit buggy? Or worse, switch away from Banner, or Salesforce?

We cannot depend on "market forces" to prove whether LLM-assisted coding is actually a good idea. We have to push for universal personal accountability for the code we commit (at least internally; I'm not calling for legal liability here!). Which is, unquestionably, going to be a huge uphill slog.

* where "you" in this case is an average PC user, or a large institution

aryehof8d ago

What concerns me the most is that improvements in software design are at an end. The “big ball of mud”, which really is a problem of modularity and dependencies, will never improve through innovation because the way it is done now is all there will ever be.

codemog8d ago

Coding agents have been better than the average "enterprise" programmer for a while now and nobody wants to admit it or talk about it. I have never seen an agent output an implementation called FooImpl that's tens of thousands of LOC in a single file, but I have seen plenty of human code like this.

People call coding agents bad because they don't know the asinine meaningless conventions at their particular company while they themselves write awful abstractions and brittle tightly coupled systems, but hey, at least they know how to write a for loop how their particular company likes.

fzeroracer8d ago

> I have never seen an agent output an implementation called FooImpl that's tens of thousands of LOC in a single file, but I have seen plenty of human code like this.

And how long does it take a coding agent to output a thousand lines of code versus a human? The worst human at any company was rate limited by themselves. Those 'average enterprise' programmers aren't going away, they're the ones now spending tens of thousands on coding agents and filling your codebase with even more garbage without bothering to review an iota of it.

2 more replies

jeppester8d ago

Yesterday Claude wanted to add a position column to what is a slightly extended many-many relation table. It did this to "make ordering stable".

An average enterprise developer would never add bloat like that up-front, unless if the ability to change the order was a requirement.

Obviously a stable order can be easily derived from the ID or a creation time (if available).

Setting a position however requires extra steps to ensure the integrity of the sequence.

I see things like that all the time, and it's always stuff that grows the code base and adds unnecessary complexity.

kuschku8d ago

> I have never seen an agent output an implementation called FooImpl that's tens of thousands of LOC in a single file, but I have seen plenty of human code like this.

I've seen countless vibecoded implementations that look exactly like that. Especially painful is agents adding the same utility functions in each and every file instead of properly reusing or splitting things.

And then I have to fix them.

what8d ago

> that's tens of thousands of LOC in a single file

Why is this worse than splitting it across 1k files?

1 more reply

andrekandre7d ago

i've never seen even a junior do something as crazy as displaying a page sheet ui from literally a color object, yes, a literal color...

abhgh8d ago· 8 in thread

These "You're right to push back" scenarios are scary for me. I mostly code ML implementations, and some of the errors Claude Code (CC - have only used Opus 4.7) makes are very sneaky, and if you don't have sufficient experience in the area (I see this with people entering ML and writing their implementations with CC), you wouldn't know when to question CC and will let errors or future pitfalls silently slip into your code. A recent example was when there was data leakage in a model calibration step, which it refused to see as an error, till I wrote a detailed reason, and then it agreed that there was a "subtle leakage".

nostrebored8d ago

The leakage problem is so pervasive. None of the frontier models seem to have any idea how to actually hold out rows. God help you if you decide to change the data mix.

I was working on creating a next-n-actions predictor for one of our use cases and not paying much attention for a PoC. I was fairly happy with the progress for a few days, before actually reading the eval code and seeing that we leaked the final state in every eval.

It's nice to let claude run loose on porting from framework to framework (port my code from TRL to NemoRL to Tinker to VeRL) but looking at what it does in the intermediate steps makes me want to claw my eyes out. And getting it to adhere to our domain model (e.g. we have an SFTConfig and a .to_trl(), or a Row and a .to_harmony()) is impossible.

glimshe8d ago

Another version of this issue is when you push back but you were NOT "right to push back". In other words, the LLM original solution was better than the pushback.

Most of the time my pushbacks are true improvements, but I've seen a couple of instances where the LLM was happy to downgrade their own good solution.

cassianoleal8d ago

I've had those as well. Sometimes I'm asking clarifying questions because I'm not sure about the solution, and the LLM "interprets" that as pushback (as opposed to curiosity / enquiry), and sycophancy takes over. Sometimes it will simply change the code without ever answering the questions, or it will answer the questions along with it, but incorrectly - or with bad assumptions.

2 more replies

embedding-shape8d ago

> Another version of this issue is when you push back but you were NOT "right to push back". In other words, the LLM original solution was better than the pushback.

Indeed, it's easy to surface this by sending one model a "Review" of their proposal to another, then bounce them back and forward, ask which one is best and both models will almost always say something like "The other proposal/review is better", I'm guessing because somehow they think it comes from the human, and "human is always right" or something.

ffsm88d ago

What's mind-blowing to me is that people see the "you're right to push back" as anything besides hallucination / self affirmation

Dude, the fucking model is great for sure, but there is nothing behind the illusion. It doesn't know if something is right or wrong - simpler or harder to reason about etc

It's just generating text, in a coherent manner while following rhetoric processes as a solid attempt at logical thinking

Why is that so hard for people to grok?

Our industry (and society after) is beyond doomed with people seeing these self affirmations as anything like "insightful" validation.

MarsIronPI8d ago

Looking on the bright side, where there's AI-generated muck, there will be brass for humans willing to clean it up.

Obscurity43408d ago

How does it correct itself then? I often will push back without giving it the way out and it often does find it

3 more replies

endofreach8d ago

We'll see.

resonious8d ago· 5 in thread

All Claude models are huge suck ups. The "you're absolutely right" meme is real even if that exact phrase doesn't show up as much anymore.

I don't want to start a fight or anything but IME Codex has a bit more of a spine. If you point out something weird, it sometimes gives a good reason for it. Whereas Claude will always say "whoopsie you're right as always sir" even when it's me who missed something.

herdymerzbow8d ago

I only use free AI chats to help me with my learning, but often I direct its responses neutral and to refrain from providing any encouraging language, or value judgements. It tends to get rid of these 'you're absolutely right' comments when I point out a mistake.

But your comment just made me think whether this tendency for LLMs to resort to flattery when found out is a built in strategy to distract the user from the error prone fragility of much of the output? It's perhaps a stretch to think these canned responses were put in strategically, but the result is that the user's attention may be deflected to contemplating their own superior knowledge and insight, and bask in the glory of all that, but then forgot to appreciate that 'Hey, chatLLM is just making all this stuff up/doesn't know which way is up/or down!'

pyridines8d ago

IME it's Claude that pushes back, and Codex that just does the thing. It's happened once or twice where I've told Claude bluntly and directly "do this" and it responded "no, here's why that's a bad idea..." Maybe it's just my CLAUDE.md.

Not sure if there are sycophancy benchmarks for coding agents

mcintyre19948d ago

I find the same. Someone posted this benchmark here: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

It measures whether models push back on bullshit prompts or just go along with it, and Claude models are all the top performers.

teaearlgraycold8d ago

Right now the thing I get from Opus 4.8 is a ton of “That’s a good instinct”. Also >50% of its closing statements begin with “Clean.”

QuiEgo7d ago

I’ve had this experience as well. I love Codex for doing code reviews, it takes a way more direct, less passive tone when calling out issues.

darkerside8d ago· 3 in thread

In fairness, you could throw the most senior engineer into a brand new codebase, and they would probably make a dozen mistakes if you immediately had them pick up invasive and risky work.

kerkeslager8d ago

No, that's not "in fairness", that's misunderstanding the entire problem.

Having worked 20 years in this field and managed a few projects, no, I wouldn't make a dozen mistakes, because I would refuse to take on work I can't responsibly do.

Invasive and risky work IS the thing I want to be working on because it's the place where I can be most valuable, but part of my value comes from asking the right people the right questions. If I'm working on something invasive and risky, I'm going to work directly with the people who wrote it, and only when THEY think I understand it well enough am I venturing in alone.

Absent access to the people who wrote the code, I'm going to start by writing tests around the code and spend a lot of time checking my initial assumptions upon reading the code, because I know that I don't know what I don't know.

Yeah, if I did foolishly just started making changes, I'd make mistakes but that's missing the point: a good senior engineer knows not to do that.

That's the failure point of AI: it's arrogant. It will provide you statements without any idea if they're true and make changes without any idea if they're correct. It will never tell you "I don't know how to do that" or even "I am not sure if this is correct". It just does the work with infinite confidence even when that confidence is not justified and often it will be just as hard to figure out if the AI's work is correct as it would be to do the work yourself.

danaris8d ago

> I'm going to work directly with the people who wrote it, and only when THEY think I understand it well enough am I venturing in alone.

...ah, what a boon it would be to be working with code written by people still working at the organization!

(No shade, just being wistful; I happen to have a history of coming in and having to deal with some messy codebases from the guy who just retired...)

alex_suzuki8d ago

> That's the failure point of AI: it's arrogant.

I agree with your take, but AI is exactly as arrogant as the human driving it.

figassis8d ago· 2 in thread

Have nee dealing with this in an area that requires insane attention: payments. It's strange feeling when you architect a system, all the invariants, all the fundamentals, all the guardrails, then implement the scaffolding in self documenting code, so the LLM has no way to build other than correctly, but you then see what it tries to do and it's WTF.

It all seems to behave correctly and then you run your test suite, and your e2e tests start failing in weirs ways, a few but not many accounting discrepancies, and everything else passes. You spend a lot of time asking it to explain what's happening, you give it the data to browse, and it keeps giving you very plausible explanations of "found the issue, the data shows this clearly, there fore the bug is here, all I need to do is fix this thing", and it does this, and it still fails.

When you open the hood, man, the code salad, the 100s of unnecessary, and complex and duplicate abstractions, the stacked mistakes and lazy corrective attempts, the comment pollution that overrides your instructions across sessions.

You realize that there are things and concepts that it just cannot wrap it's "mind" around and you need to grab the wheel for a bit, make the corrections, remove all the comment litter, commit and then hand the wheel back and tell it to "look at the last commit so see what I mean. explain to me what you did wrong and update all documentation, memory and context with this new understanding".

So if you have no experience in the field, you won't even know how to test, how to find that there is an issue, the appearance of "working" and the AI's confidence will trip you in prod so hard.

itopaloglu838d ago

In my experience Claude tends to immensely over complicate things and go for a complex abstraction scheme even when all it needs to do is two lines of code. Combined with its eagerness to just code and more importantly pay more attention to the last prompt causes it to do an insanely complex solution first and then patch things with half assed attempts. The whole ordeal results in a code that on an initial glance looks okay, but quickly breaks down and becomes unmanageable. A significant effort is needed to push back Claude’s tendencies, so I mainly find myself pushing back or looking for ways to write an initial prompt with enough guidance, but only Fable was following them properly, Opus simply acts like a rhino in a china shop.

figassis8d ago

btw sorry for the typos, just re-read this and looks like a dyslexic person wrote it

fy208d ago· 1 in thread

A nice trick I've found is following up with "make it simpler". Often you can do 2-3 rounds of that and end up with something much easier to comprehend but still meeting the requirements.

I have a Rails background, so maybe KISS is more engrained in my philosophy than whatever training material was used on AI. At least it isn't heavily pushing design patterns...

dzonga8d ago

Yeah noticed the same thing too - Ruby/Rails background, though I have done distributed systems in java (too many unnecessary abstractions in that ecosystem)

then you add the simplicity / lessons of clojure of using simple datastructures & functions - simply agents become frustrating - cz most of the things I need to get done are done in a few lines

majority of the time is spent thinking by me to save a few lines.

busterarm8d ago· 1 in thread

> With enough of a token budget you can now wrap loops around an LLM and have it try things until the program appears to work. Ask it to do a code review and then submit the PR without having understood what it was doing. There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.

I'm not making an argument in favor of people using LLMs for this, but people were doing this before we had LLMs it was just usually a bit slower. I can't even say it usually doesn't work out long term because I worked with a lot of guys who did this and took a ton of Adderall while working practically around the clock. Every incentive structure in the organizations rewarded it along with social credibility from more junior engineers. (The last cowboy I worked with who pulled this shit ended up becoming the most senior engineer in the company, a multi-millionaire and worshipped like a god by 90% of the mostly fresh grads we were hiring).

The problem is when invariably these people burn out eventually and leave, they leave a massive vacuum in their stead. Not from load they were carrying but creating.

I think the larger the organization I've been at, the more they reward the people making huge commits on nights and weekends. Worse, they could get away with TBRing their shit and merging it without review.

LLMs are often all of the bad habits and organizational problems that we already carryied just being speedrun. There are some places doing it right, but they already were.

timacles8d ago

> There are some places doing it right, but they already were.

Could you be more specific what "right" is?

> I can't even say it usually doesn't work out long term because I worked with a lot of guys who did this and took a ton of Adderall while working practically around the clock. Every incentive structure in the organizations rewarded it along with social credibility from more junior engineers. (The last cowboy I worked with who pulled this shit ended up becoming the most senior engineer in the company, a multi-millionaire and worshipped like a god by 90% of the mostly fresh grads we were hiring).

I'm having a tough time believing this, it sounds like you're trying to backwards rationalize more productive engineers were "on drugs" and they delivered but "did it wrong"

matltc8d ago

> the scary middle

Not coworkers, but I started getting contributions on public GitHub repos that attempted to close issues tagged with the default "good first issue" label. Got real excited when one project I'm stoked for got its first contribution, until I looked at the PR. The account it was tied to was someone looking for work. Looked like what a model would output for a LinkedIn Job seeker NPC--im sure you can imagine.

4fffs7d ago

"It’s really eye opening to work with these tools on a codebase you know deeply because these problems are everywhere."

Same could be said w.r.t interacting with LLMs on stuff you are an expert on.

The thing is laborious, over-does it, slow and wasteful.

QuiEgo7d ago

To be fair, before AI I had my fair share of coworkers throwing stuff over the fence who only cared about closing tickets and collecting credit.

You all know the feeling: you see a code review from _that person_ and you know its gonna be a long day. And you know they are going to fight you every step of the way and say “but it works” when you leave a comment about their code being hard to maintain.

dapperdrake8d ago

Maybe that feedback loop finally got fast enough to die out.

latexr8d ago

In your second and third paragraphs you’re essentially describing Gell-Mann amnesia.

https://en.wikipedia.org/wiki/Michael_Crichton#%22Gell-Mann_...

justinclift8d ago

> "You’re right to push back"

It sounds like you've not conditioned your Claude to stop being a sycophant yet?

j / k navigate · click thread line to collapse

0 comments

42 comments · 13 top-level

embedding-shape8d ago· 9 in thread

> There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.

onion2k8d ago

I guess we'll see a row of companies declaring "software bankruptcy" or something in some/many months

Instead you'll see:

- Churn in the dev team with senior developers leaving rather than try to deal with the mess

danaris8d ago

The problem is that this is just another instance of trusting that "the market will solve all our problems."

But that's based on "spherical economy in a frictionless vacuum" type assumptions.

* where "you" in this case is an average PC user, or a large institution

aryehof8d ago

codemog8d ago

fzeroracer8d ago

> I have never seen an agent output an implementation called FooImpl that's tens of thousands of LOC in a single file, but I have seen plenty of human code like this.

2 more replies

jeppester8d ago

Yesterday Claude wanted to add a position column to what is a slightly extended many-many relation table. It did this to "make ordering stable".

An average enterprise developer would never add bloat like that up-front, unless if the ability to change the order was a requirement.

Obviously a stable order can be easily derived from the ID or a creation time (if available).

Setting a position however requires extra steps to ensure the integrity of the sequence.

I see things like that all the time, and it's always stuff that grows the code base and adds unnecessary complexity.

kuschku8d ago

> I have never seen an agent output an implementation called FooImpl that's tens of thousands of LOC in a single file, but I have seen plenty of human code like this.

And then I have to fix them.

what8d ago

> that's tens of thousands of LOC in a single file

Why is this worse than splitting it across 1k files?

1 more reply

andrekandre7d ago

i've never seen even a junior do something as crazy as displaying a page sheet ui from literally a color object, yes, a literal color...

abhgh8d ago· 8 in thread

nostrebored8d ago

The leakage problem is so pervasive. None of the frontier models seem to have any idea how to actually hold out rows. God help you if you decide to change the data mix.

glimshe8d ago

Another version of this issue is when you push back but you were NOT "right to push back". In other words, the LLM original solution was better than the pushback.

Most of the time my pushbacks are true improvements, but I've seen a couple of instances where the LLM was happy to downgrade their own good solution.

cassianoleal8d ago

2 more replies

embedding-shape8d ago

> Another version of this issue is when you push back but you were NOT "right to push back". In other words, the LLM original solution was better than the pushback.

ffsm88d ago

What's mind-blowing to me is that people see the "you're right to push back" as anything besides hallucination / self affirmation

Dude, the fucking model is great for sure, but there is nothing behind the illusion. It doesn't know if something is right or wrong - simpler or harder to reason about etc

It's just generating text, in a coherent manner while following rhetoric processes as a solid attempt at logical thinking

Why is that so hard for people to grok?

Our industry (and society after) is beyond doomed with people seeing these self affirmations as anything like "insightful" validation.

MarsIronPI8d ago

Looking on the bright side, where there's AI-generated muck, there will be brass for humans willing to clean it up.

Obscurity43408d ago

How does it correct itself then? I often will push back without giving it the way out and it often does find it

3 more replies

endofreach8d ago

We'll see.

resonious8d ago· 5 in thread

All Claude models are huge suck ups. The "you're absolutely right" meme is real even if that exact phrase doesn't show up as much anymore.

herdymerzbow8d ago

pyridines8d ago

Not sure if there are sycophancy benchmarks for coding agents

mcintyre19948d ago

I find the same. Someone posted this benchmark here: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

It measures whether models push back on bullshit prompts or just go along with it, and Claude models are all the top performers.

teaearlgraycold8d ago

Right now the thing I get from Opus 4.8 is a ton of “That’s a good instinct”. Also >50% of its closing statements begin with “Clean.”

QuiEgo7d ago

I’ve had this experience as well. I love Codex for doing code reviews, it takes a way more direct, less passive tone when calling out issues.

darkerside8d ago· 3 in thread

In fairness, you could throw the most senior engineer into a brand new codebase, and they would probably make a dozen mistakes if you immediately had them pick up invasive and risky work.

kerkeslager8d ago

No, that's not "in fairness", that's misunderstanding the entire problem.

Having worked 20 years in this field and managed a few projects, no, I wouldn't make a dozen mistakes, because I would refuse to take on work I can't responsibly do.

Yeah, if I did foolishly just started making changes, I'd make mistakes but that's missing the point: a good senior engineer knows not to do that.

danaris8d ago

> I'm going to work directly with the people who wrote it, and only when THEY think I understand it well enough am I venturing in alone.

...ah, what a boon it would be to be working with code written by people still working at the organization!

(No shade, just being wistful; I happen to have a history of coming in and having to deal with some messy codebases from the guy who just retired...)

alex_suzuki8d ago

> That's the failure point of AI: it's arrogant.

I agree with your take, but AI is exactly as arrogant as the human driving it.

figassis8d ago· 2 in thread

So if you have no experience in the field, you won't even know how to test, how to find that there is an issue, the appearance of "working" and the AI's confidence will trip you in prod so hard.

itopaloglu838d ago

figassis8d ago

btw sorry for the typos, just re-read this and looks like a dyslexic person wrote it

fy208d ago· 1 in thread

A nice trick I've found is following up with "make it simpler". Often you can do 2-3 rounds of that and end up with something much easier to comprehend but still meeting the requirements.

I have a Rails background, so maybe KISS is more engrained in my philosophy than whatever training material was used on AI. At least it isn't heavily pushing design patterns...

dzonga8d ago

Yeah noticed the same thing too - Ruby/Rails background, though I have done distributed systems in java (too many unnecessary abstractions in that ecosystem)

then you add the simplicity / lessons of clojure of using simple datastructures & functions - simply agents become frustrating - cz most of the things I need to get done are done in a few lines

majority of the time is spent thinking by me to save a few lines.

busterarm8d ago· 1 in thread

The problem is when invariably these people burn out eventually and leave, they leave a massive vacuum in their stead. Not from load they were carrying but creating.

LLMs are often all of the bad habits and organizational problems that we already carryied just being speedrun. There are some places doing it right, but they already were.

timacles8d ago

> There are some places doing it right, but they already were.

Could you be more specific what "right" is?

I'm having a tough time believing this, it sounds like you're trying to backwards rationalize more productive engineers were "on drugs" and they delivered but "did it wrong"

matltc8d ago

> the scary middle

4fffs7d ago

"It’s really eye opening to work with these tools on a codebase you know deeply because these problems are everywhere."

Same could be said w.r.t interacting with LLMs on stuff you are an expert on.

The thing is laborious, over-does it, slow and wasteful.

QuiEgo7d ago

To be fair, before AI I had my fair share of coworkers throwing stuff over the fence who only cared about closing tickets and collecting credit.

dapperdrake8d ago

Maybe that feedback loop finally got fast enough to die out.

latexr8d ago

In your second and third paragraphs you’re essentially describing Gell-Mann amnesia.

https://en.wikipedia.org/wiki/Michael_Crichton#%22Gell-Mann_...

justinclift8d ago

> "You’re right to push back"

It sounds like you've not conditioned your Claude to stop being a sycophant yet?

j / k navigate · click thread line to collapse