undefined | Better HN

0 pointspron7h ago0 comments

Yep. The only people I've heard saying that generated code is fine are those who don't read it.

The problem is that the mitigations offered in the article also don't work for long. When designing a system or a component we have ideas that form invariants. Sometimes the invariant is big, like a certain grand architecture, and sometimes it’s small, like the selection of a data structure. You can tell the agent what the constraints are with something like "Views do NOT access other views' state" as the post does.

Except, eventually, you'll want to add a feature that clashes with that invariant. At that point there are usually three choices:

- Don’t add the feature. The invariant is a useful simplifying principle and it’s more important than the feature; it will pay dividends in other ways.

- Add the feature inelegantly or inefficiently on top of the invariant. Hey, not every feature has to be elegant or efficient.

- Go back and change the invariant. You’ve just learnt something new that you hadn’t considered and puts things in a new light, and it turns out there’s a better approach.

Often, only one of these is right. Often, at least one of these is very, very wrong, and with bad consequences.

Picking among them isn’t a matter of context. It’s a matter of judgment, and the models - not the harnesses - get this judgment wrong far too often. I would say no better than random chance.

Even if you have an architecture in mind, and even if the agent follows it, sooner or later it will need to be reconsidered. What I've seen is that if you define the architectural constraints, the agent writes complex, unmaintainable code that contorts itself to it when it needs to change. If you don't read what the agent does very carefully - more carefully than human-written code because the agent doesn't complain about contortious code - you will end up with the same "code that devours itself", only you won't know it until it's too late.

0 comments

perarneng5h ago

If you know how to write good code you can force AI to write good code with various techniques. It's 100% doable. You just need to figure out the problems AI has and find solutions to make it easier for it. Ex: extremely small contexts Modularize to modules with clear boundaries and only allow the AI to work within those boundaries. Make modules pure from IO so they are easily testable. Hide modules behind interfaces etc .. You can write 100 tests that executes within a second. You can write benchmarks etc .. AI needs boundaries and small contexts to work well. If you fail to give it that it will perform poorly. You are in charge.

pronOP5h ago

That doesn't quite work, and precisely for the reason I mentioned: You can definitely tell the AI to follow some strategy, but at some point the strategy will need to change, and the AI won't tell you that (even if you tell it to). Unless you read the code every time you won't know if the AI is following the strategy and producing good results or following it and producing bad results because the strategy has to change. This can happen even in small changes: the AI will follow the strategy even if the change proves it's wrong, and if you don't pay close attention, these mistakes pile up.

So yes, you might get good results in one round, but not over time. What does work is to carefully review the AI's output, although the review needs to be more careful than review of human-written code because the agents are very good at hiding the time bombs they leave behind.

lukan4h ago

How do you define "bad code"?

If I instruct the AI to make small modules where I can verify they work, have tests and no side effects - then it is good enough code for me. It works, is readable and can be extended - and will turn into bad code if this is not done with care.

3 more replies

IdiotSavage5h ago

So, basically you need to micro-manage it. Where are your 10x gains now? And is it fun to work like that?

sirwhinesalot4h ago

This is actually what I do. I'm extremely picky about the code and force the LLM to rewrite it 1000x times until it is basically exactly what I want. You might be wondering what is the point when it would be faster for me to just write the code myself?

I have ADHD and for whatever reason telling the LLM what to do instead of doing it myself bypasses the task avoidance patterns and/or focus problems I tend to suffer from. I do not find it fun, but I am thankful for it.

black_knight4h ago

I have used LLMs a couple of times to get started on something. I don’t have ADHD, so this is not a regular occurrence for me. But when I have tried this, I have always found the LLM solution so horrible that it instantly inspired me to do it myself. So, in that sense it worked, I got unstuck, but no LLM garbage makes it into the project.

2 more replies

stepbeek1h ago

This framing of it being a tool that you find indispensable as an individual is important. I’m not interested in debating static vs dynamic types, or vim vs emacs, etc. If it works for you, then that’s great!

But the difference with LLMs currently - I guess? - is that non-engineers are pushing the idea that it’s universally indispensable at scale. I think it leads to a lot of emotion bleeding into the debate.

yakattak3h ago

This is kind of how I feel I think. Putting pen to paper for me is hard.

readitalready1h ago

I don't micromanage it. I let my projects custom linter micromanage it.

Every project should have a custom linter for their tech stack. It would check for not just syntax errors, but architectural choices as well as taste guidelines.

Whenever the LLM writes bad code, I add it to my linter to check against in the future.

andriy_koval45m ago

> So, basically you need to micro-manage it. Where are your 10x gains now? And is it fun to work like that?

it depends on language and infra, but some/many require lots of boilerplate and memorizing thousands of APIs, automating this is easy LLM 10x gain.

I for example write SQL myself, because boilerplate is super-minimal, and core SQL is very minimal itself, there are like 20 constructs to memorize.

hansmayer5h ago

Amen. Instead of freeing you up - AI enslaves you - and if it was even enslaving to a superior being at least!

nijave1h ago

Honestly, I think so. I do a mix of infrastructure and programming so don't tend to have any frameworks memorized. Using AI is much quicker than constantly referencing the docs.

I can also switch between codebase with different frameworks and languages and make changes without spending all day reading docs.

It's also pretty good at tracing code and that's fairly straight forward to verify the results manually. It can build a flow diagram in 10-30 minutes (depending on what tool calls need allowed and how many prompts it needs) versus me taking a couple hours to do the same.

forgotaccount35h ago

> you need to micro-manage it.

It is significantly easier to micro-manage an AI than a suite of junior developers. The AI doesn't replace a principal engineer, it's replacing junior and weaker senior developers who need stories broken down extremely concisely to be able to get anything done. The time it takes to break down a story such that a junior through weak senior developers can pick it up and execute it well would have the AI already done with testing built around it.

gylterud4h ago

Juniors learn. Some juniors are potential good seniors. Over time they will internalise good architecture and be able to make good judgments on their own.

Micromanaging LLMs is like having Dory from Finding Nemo as your colleague. You find ways to communicate, but there is no learning going on.

2 more replies

hansmayer3h ago

I think if you tried working with some junior folks, you'd be quite surprised. You know, with at least some of them choosing to use their brains and all.

wombat-man3h ago

Yeah I agree. It's improved quite a bit just in the past few months. The code should always be reviewed, and you need to spend some time tuning your skills and agent configs. If you're still getting bad code out of your LLM tooling, you might not be using or configuring it correctly.

hansmayer4h ago

> You are in charge.

No, if you have to do all of the stuff you have listed to kind-of-make-it-work...You are not in charge.

insane_dreamer1h ago

> You are in charge.

Sure. That's how I work with AI, and the way I believe that AI is meant to be use -- as a companion tool.

But it's a lot of work. It saves me time for certain tasks, but not others. I haven't measured my productivity gains, but they're at most 2x.

But that's not "vibe coding" (which was the point of the article) or the (false) promise of "10x productivity" and "code that writes itself" that companies are being told is going to reduce their engineering headcount tenfold.

Zach_the_Lizard6h ago

I agree with this. I've been writing a new internal framework at work and migrating consumers of the old framework to the new one.

I had strong principles at the outset of the project and migrated a few consumers by hand, which gave me confidence that it would work. The overall migration is large and expensive enough that it has been deferred for nearly a decade. Bringing down the cost of that migration made me turn to AI to accelerate it.

I found that it was OK at the more mechanical and straightforward cases, which are 80% of the use cases, to be fair. The remaining 20% need changes to the framework. Most of them need very small changes, such as an extra field in an API, but one or two require a partial conceptual redesign.

To over simplify the problem, the backend for one system can generate certain data in 99% of cases. In a few critical cases, it logically cannot, and that data must be reported to it. Some important optimizations were made with the assumption that this would be impossible.

The AI tooling didn't (yet) detect this scenario and happily added migration logic assuming it would work properly.

Now, because of how this is being rolled out, this wasn't a production bug or anything (yet). However, asking the right questions to partner teams revealed it and unearthed that some others were going to need it as well.

Ultimately, it isn't a big problem to solve in a way that will mostly satisfy everyone, but it would have been a big problem without a human deeper in the weeds.

Over time, this may change. Validation tooling I built may make a future migration of this kind easier to vibe code even if AI functionality doesn't continue to improve. Smarter models with more context will eventually learn these problems in more and more cases.

The code it generates still oscilates between beautiful and broken (or both!) so for now my artistic sensibilities make me keep a close eye on it. I think of the depressed robot from the Hitchhiker's Guide to the Galaxy as the intelligence behind it. Maybe one day it'll be trustworthy

benguild7h ago

“The only people I've heard saying that generated code is fine are those who don't read it.” Are you sure these people aren’t busy working rather than chatting? (haha)

But in all seriousness it depends on what you’re doing with it. Writing a quick tool using an LLM is much easier than context changing to write it yourself. If you need the tool, that’s very valuable.

sevenzero7h ago

Also as a webdev, it writes basic CRUD pretty good. I am tired of having to build forms myself and the LLMs are usually really good at that.

Been building a new app with lots of policies and whatnot and instructing a LLM is just much faster than doing the same repetitive shit over and over myself.

spockz6h ago

If you were tired of writing forms yourself, had you looked at https://jsonforms.io/? Just specify the the data you need, or extract it from the api spec and go. Display the form uniformly every time across your site. No need to burn AI time.

sevenzero5h ago

I typically avoid any most abstractions or third party dependencies. Yea it could be neat, but I still need a lot of custom logic here and there. Same reason I avoid stuff like GraphQL.

A little update: upon viewing the page on phone, for me the "comitter" field in the demo is going out of bounds... Really not speaking for their product.

2 more replies

drbojingle5h ago

This might pair well with something like https://data-atlas.net.

pronOP6h ago

Sure. I'm talking about production software that needs to survive and evolve for a long while.

pydry5h ago

This the core unspoken bone of contention in most AI arguments I think: most people either arent writing code with strict quality requirements or dont realize where their use of AI is violating them.

That said most of the world's most useful code has strict quality requirements. Even before AI 90% of SLOC would be tossed away without much if any use, 9% was used infrequently while 1% runs half the world's software.

mountainriver5h ago

Can you not review it?

RugnirViking4h ago

I think this misses the scale of the problem. Review never fixed tech debt, nor did it fix relevant/bloated test suites. It didn't solve complexity, or eliminate footguns. Very few people (I would argue almost noone) had developed theories for what all of these even were, or how to spot them in code.

Reviewers aren't perfect, far from it. And we just gave them ~20x more code to review. Incentives mean that taking 20x longer to review is unacceptable. So where do we go from here?

throwaway1737384h ago

Review always misses something.

tcgv49m ago

> "Yep. The only people I've heard saying that generated code is fine are those who don't read it."

I review every line of code I generate with AI. I mainly use an MR-based approach:

1) Provide a tightly scoped technical spec to Codex as a task, and ask for 3x solutions. Usually at least one of them is on the right track, and it is better to ditch a solution that went in the wrong direction than to try to fix it.

2) Review the explanation and diff of the proposed changes line by line, file by file. If I find minor deviations from what I asked, or violations of the codebase architecture/conventions, I write comments in the diff and/or global comments, and ask again for 3x adjusted solutions.

3) Usually, by this point, the solution is ready for me to merge locally and either run local tests or do some manual fine-tuning.

4) Finally, I generate unit tests. I leave them to this stage because I can repeat the same process with the sole intent of generating case-specific unit tests. This way, I can generate/review tests against the final version of the implementation.

This has been working very well for me since our repos are reasonably organized and have a well-defined architecture. In the technical spec, I include the major architectural requirements and code conventions, and I also add a catch-all like "follow the codebase's existing conventions and style", which works reasonably well.

This simple process has enabled me to deliver most minor/medium tasks and bug fixes really quickly while maintaining control over the changes and without lowering the quality bar. For larger and more challenging tasks, I find myself "driving the wheel" (i.e. coding by hand) more often, and using AI code generation in a much more scoped and specific way. So that becomes a different process altogether.

agentultra4h ago

The invariant, stated informally, would be hard to prove is broken by a human reviewer in the loop. Spoken language isn’t precise enough for the task.

Even if you could state it in a precise formal language the LLM under the agent doesn’t have the capability to understand what the invariant is for and why it’s important. You’ll still get oddly generated code. You might get an LLM that can associate certain tokens with those in the formal language specification which can hold invariants and perhaps even write the proofs… but you’ll still get a whole bunch of other code generated from the informal parts of the prompt.

I agree that simply adding constraints and prompts to you skills and specs isn’t going to prevent these things. Worse, that even if you could invent a better mouse trap the creature will still escape.

The problem is… “elongation:” the addition of code for the sake of the prompt/task/etc. Often less is better. This takes a human with the ability to anticipate what other humans would want/expect. When you need a generator, they’re great but it’s a firehouse that whose use should be restrained a little more.

pronOP4h ago

> The invariant, stated informally, would be hard to prove is broken by a human reviewer in the loop. Spoken language isn’t precise enough for the task.

That depends on the invariant. Some are behavioural, like "variable x must be even if y is positive", but some are architectural, such as "a new view requires a new class".

But that's only one side of the problem because maintaining the invariant can be just as bad as breaking it. You ask the agent to add a feature and it may well maintain the invariant - only it shouldn't have, because the feature uncovers the fact that the invariant is architecturally wrong.

The problem is that evolving software requires exercising judgment about when you need to follow the existing strategy and when you need to rethink it. If there is any mechanical rule that could state what the right judgment is, I don't know what it is.

agentultra4h ago

Yes! I was trying to make this part of my point but you definitely made it much more clear and concise.

With a skilled operator, it could be possible to drive an agent to handle these kinds of changes. I would be concerned that spoken language wouldn't be precise enough to handle the refactoring and changes necessary to make to a code base when an invariant changes... regardless of whether it was a property, architectural, or procedural change. It already can take several prompts and burn quite a few tokens doing large-scale rewrites and code changes. Maybe the parameters and weights can be tuned for this kind of work but I remain skeptical that what we have at present is "efficient" at this kind of work.

21asdffdsa127h ago

And the solution is the same, as when it was outsourced- and the "patch" was fix it by writing spec. Thus i conclude my TED talk with the statement: LLMs are the new outsourcing and run into the same problems.

pronOP6h ago

Not quite, because the architecture often needs to evolve when you learn more as the project evolves. People will complain when they feel the constraints drive them to unnatural workarounds, the agents don't.

You can try telling the agent to stop and ask when a constraint proves problematic, except it doesn't have as good a judgment as humans to know when that's the case. I often find myself saying, "why did you write that insane code instead of raising the alarm about a problem?" and the answer is always, "you're absolutely right; I continued when I should have stopped." Of course, you can only tell when that happens if you carefully review the code.

multjoy33m ago

It has no judgement at all.

senordevnyc3h ago

So I run a solo saas that supports my family, and so the stakes feel very high for me. I use AI heavily, and I’ve seen the exact problem you’re describing. I feel like I’m often really riding the edge in terms of trying to use AI to accelerate product development while not letting tech debt accumulate too fast, or let my mental model of the codebase slip too much.

Here’s what’s working for me right now:

1. The basics: use best model available, have skills and rules that specify project guidelines, etc.

2. Always use plan mode. It works much better to iterate on the concept of what we’re going to do, then do the implementation. The models will adhere to the plan at very high rates in my experience.

3. Don’t give chunks of work that are too large in scope. This is just art, and I’m constantly experimenting with how ambitious I can be.

4. I review all code to some extent, but I have a strong mental model of what areas of the app are more critical, where hidden bugs might accumulate, etc, and I review both tests and impl more strenuously in those areas. Whereas like a widget for my admin panel probably gets a 2 second glance.

5. Have the discipline to go through periodically and clean up tech debt, refactor things that you’d do differently now, etc. I find the AI a huge help here, because I can clean up cruft in an hour that would have once taken me days, and thus probably wouldn’t have gotten done.

6. I’m experimenting with shifting my architecture to make it easier to review AI code, make it less likely it’ll make mistakes, etc. Honestly mostly things I should have always been doing, but the level of formalism and abstraction on my solo projects is usually different than on a bigger team.

To each their own, but I’ve grown this from nothing to about $350k in ARR over the last ten months, and I’m very confident I never could have built this product without AI help in triple that time.

marcosdumay2h ago

It's approximately the same problems, but stretched to an insane extent that you can never expect before it arrives.

i_love_retros7h ago

Don't outsource either then

21asdffdsa126h ago

How about we outsource it to pakistan and they use LLMs. That way, we do what the LLM people do - many agents and stacked on top

daishi554h ago

The generated code is more than fine, it’s good in many cases. And I read it :)

Indeed for the task of “jump into an unfamiliar codebase and make a requested change that aligns with existing styles and patterns, and uses existing functionality” I would say something like opus 4.7 exceeds the capabilities of most developers.

pronOP4h ago

I agree with both statements, but that doesn't change the problem I stated. If an agent produces reasonable code 80-90% of the time, and 10-20% of the time it makes mistakes that could render the codebase irretrievably unevolvable once they accumulate, the only thing you can do is to carefully review the agent's output 100% of the time. That it gets things right 80% of the time as opposed to 40% of the time doesn't change this calculus one iota.

But agents generate code much faster, and to know slow them down, some people want to not do the only thing that can currently ensure you get good results, which is to carefully review the output. Once that happens, there is simply no way for them to know how good or bad what they're getting is.

noelsusman2h ago

I guess I don't understand how this logic doesn't apply to human developers.

1 more reply

Kiro3h ago

And humans produce 100% reasonable code or what? The kind of mess me and everyone I've worked with produces by hand is the inverse of that. Constant shortcuts and lazy slop through and through. Never worked anywhere where the code wasn't an entangled disarray.

As soon as requirements change the abstractions fall apart and everything gets shoehorned.

1 more reply

WalterBright40m ago

My own code is contortious. I refactor it regularly to reduce that, but it still can be better.

__alexs4h ago

I read all the code I generate with Cursor and some of it smells a bit weird but is easily fixable and most of it is as good as what I would write or better.

bicepjai2h ago

This is the rule I have settled on and I can feel why. Writing the first buggy working version with agents is always fun. Then making the software reliable with the agents, the way you want is very painful.

stingraycharles6h ago

> Picking among them isn’t a matter of context. It’s a matter of judgment, and the models - not the harnesses - get this judgment wrong far too often. I would say no better than random chance.

Yeah I’m currently working for several months already on a harness that wraps Claude Code and Codex etc to ensure that these types of invariants are captured and enforced (after the first few harness attempts failed), and - while it’s possible - slows down the workflow significantly and burns a lot more tokens. In addition to requiring more human involvement, of course.

I suspect this is the right direction, though, as the alternatives inevitably lead any software project to delve into a spaghetti mess maintenance nightmare.

pronOP6h ago

It's not enough to enforce the invariants because they may need to change. You need to follow the invariants when they're right, and go back and reconsider them when they prove unhelpful. Knowing which is the case requires judgment that today's models are simply incapable of (not consistently, at least).

leonaves4h ago

What's the difference between asking an AI to write you a module you never read and installing a 3rd-party module without auditing all its source code?

Xirdus3h ago

If the 3rd party module is popular, its badness will affect other people too and either the module will get improved or well known workarounds/"best practices" will develop. With AI-generated code, more often than not you're the sole user.

skydhash3h ago

Trust and reputation.

I would use Stripe, curl, and ffmpeg without audits, because I trust them to provide good code and to respect their API. I wouldn’t trust AI to write a Fibonacci series implementation.

The AI has no reputation to wager for my trust.

frikk3h ago

stars on github? I've wondered the same thing.

indoordin0saur2h ago

Write your code by hand, but AI still serves as something of a stack overflow and code completion tool. Also good for writing tedious things like regex or little one-off utility scripts as well as a first crack at unit tests. Using it to actually write big blocks of important code is a no-no in my opinion as it produces what I would characterize as slop, even if it technically works.

zephen2h ago

> What I've seen is that if you define the architectural constraints, the agent writes complex, unmaintainable code...

To be fair, there are many people like this as well. One of my personal favorite examples was way back in the 80s when I inherited the code for a protocol converter that let ASCII terminals communicate with IBM mainframes via the 3270 protocol.

One of the pieces of code in there, for managing indicator lights, was simply wrong. It was ca. 150 lines of Z80 assembly language that was trying to faithfully follow the copious IBM documentation of how things worked, but it had subtle issues and didn't always work.

My approach was to accept the documentation as accurate (the IBM documentation was always verbose and almost never wrong), but to reason that the original 3270 had these functions implemented in TTL logic gates, and there was no way in heck that they were wasting enough gates on indicator lights to require the logical equivalent of 150 instructions.

So in my mind, it had to be a really simple circuit that had emergent properties that required the reams of documentation. With that mindset, I was able to craft correct code for this in 12 instructions.

Many systems are likewise fractal in nature. You want to figure out the generating equations, rather than all the rules that derive from those. And, in many cases, writing down the generating equations is at least as easy to do in code as it would be to do in English for someone or something else to implement.

linuxftw4h ago

Try plan mode. The problems you're speaking about are already solved.

pronOP2h ago

They are nowhere near solved. Agents make serious mistakes in judgment and do it frequently enough to threaten the viability of the codebase unless you slow down and monitor them very, very closely. If you do that, it's all good. If you're not, your codebase is rotting at a superhuman speed underneath you and you have no idea until it collapses.

linuxftw1h ago

I agree they make mistakes in judgement, that's the whole point of plan mode. That judgement comes to the surface before lots of tokens are wasted without sight of the overall solution.

It's all very simple. "Use x library, data model should be xyz, do m, not n."

They're obviously not at the point of replacing an experienced programmer as far as knowing the start-to-finish way of accomplishing every detail, that's what the human is for.

hatefulmoron2h ago

Plan mode improves results, but it doesn't solve the underlying problems. Pretty often Claude Opus 4.7 on xhigh will formulate a reasonable enough plan, churn for a while, then come back with a summary that it didn't stick to the plan because it wasn't accurate.

Worse, the disclaimer is buried under a bunch of "did X, did Y on line Z of file a/b/c", as if it's just a minor inconvenience. To the extent the plan was inaccurate, you're left in an undefined state where you might as well undo what it just did..

linuxftw2h ago

You have to review the plan and fill in any missing gaps or correct anything that's wrong. Plan mode often isn't one shot, it might take a few iterations, but once the plan is nailed down, the results are usually very good.

hatefulmoron2h ago

You're right. I think having it spawn lots of subagents, read everything, formulate a big and detailed plan, only for it to be subtly wrong while requiring me to carefully review the result and the intermediate plans that produced it is quite tiring. I suppose things slip through.

1 more reply

jstummbillig6h ago

> The only people I've heard saying that generated code is fine are those who don't read it.

Well, that is problematic. I have to either assume you are disinterested or lying and neither is great for any discourse.

nathanielks5h ago

Yeah, their statement just isn't true. With enough instruction, I've been able to get great output from models. I think that's the key: with detailed, pointed instructions, the output will match.

rimliu4h ago

how do you know it matches? You did read it then?

nathanielks2h ago

Indeed, I'm not using LLM output without thorough review.

After reading a bunch of other comments, it sounds like people are referring to letting agents go wild and code whatever off a limited prompt. I'm not using LLMs like that; I'm generally interacting only via conversations with pretty detailed initial prompts. My interactions with the chat after that are corrections/guiding prompts to keep it on point and edit the prompt output from time to time.

1 more reply

j / k navigate · click thread line to collapse

0 comments

perarneng5h ago

pronOP5h ago

lukan4h ago

How do you define "bad code"?

3 more replies

IdiotSavage5h ago

So, basically you need to micro-manage it. Where are your 10x gains now? And is it fun to work like that?

sirwhinesalot4h ago

black_knight4h ago

2 more replies

stepbeek1h ago

yakattak3h ago

This is kind of how I feel I think. Putting pen to paper for me is hard.

readitalready1h ago

I don't micromanage it. I let my projects custom linter micromanage it.

Every project should have a custom linter for their tech stack. It would check for not just syntax errors, but architectural choices as well as taste guidelines.

Whenever the LLM writes bad code, I add it to my linter to check against in the future.

andriy_koval45m ago

> So, basically you need to micro-manage it. Where are your 10x gains now? And is it fun to work like that?

it depends on language and infra, but some/many require lots of boilerplate and memorizing thousands of APIs, automating this is easy LLM 10x gain.

I for example write SQL myself, because boilerplate is super-minimal, and core SQL is very minimal itself, there are like 20 constructs to memorize.

hansmayer5h ago

Amen. Instead of freeing you up - AI enslaves you - and if it was even enslaving to a superior being at least!

nijave1h ago

Honestly, I think so. I do a mix of infrastructure and programming so don't tend to have any frameworks memorized. Using AI is much quicker than constantly referencing the docs.

I can also switch between codebase with different frameworks and languages and make changes without spending all day reading docs.

forgotaccount35h ago

> you need to micro-manage it.

gylterud4h ago

Juniors learn. Some juniors are potential good seniors. Over time they will internalise good architecture and be able to make good judgments on their own.

Micromanaging LLMs is like having Dory from Finding Nemo as your colleague. You find ways to communicate, but there is no learning going on.

2 more replies

hansmayer3h ago

I think if you tried working with some junior folks, you'd be quite surprised. You know, with at least some of them choosing to use their brains and all.

wombat-man3h ago

hansmayer4h ago

> You are in charge.

No, if you have to do all of the stuff you have listed to kind-of-make-it-work...You are not in charge.

insane_dreamer1h ago

> You are in charge.

Sure. That's how I work with AI, and the way I believe that AI is meant to be use -- as a companion tool.

But it's a lot of work. It saves me time for certain tasks, but not others. I haven't measured my productivity gains, but they're at most 2x.

Zach_the_Lizard6h ago

I agree with this. I've been writing a new internal framework at work and migrating consumers of the old framework to the new one.

The AI tooling didn't (yet) detect this scenario and happily added migration logic assuming it would work properly.

Ultimately, it isn't a big problem to solve in a way that will mostly satisfy everyone, but it would have been a big problem without a human deeper in the weeds.

benguild7h ago

“The only people I've heard saying that generated code is fine are those who don't read it.” Are you sure these people aren’t busy working rather than chatting? (haha)

sevenzero7h ago

Also as a webdev, it writes basic CRUD pretty good. I am tired of having to build forms myself and the LLMs are usually really good at that.

Been building a new app with lots of policies and whatnot and instructing a LLM is just much faster than doing the same repetitive shit over and over myself.

spockz6h ago

sevenzero5h ago

I typically avoid any most abstractions or third party dependencies. Yea it could be neat, but I still need a lot of custom logic here and there. Same reason I avoid stuff like GraphQL.

A little update: upon viewing the page on phone, for me the "comitter" field in the demo is going out of bounds... Really not speaking for their product.

2 more replies

drbojingle5h ago

This might pair well with something like https://data-atlas.net.

pronOP6h ago

Sure. I'm talking about production software that needs to survive and evolve for a long while.

pydry5h ago

This the core unspoken bone of contention in most AI arguments I think: most people either arent writing code with strict quality requirements or dont realize where their use of AI is violating them.

mountainriver5h ago

Can you not review it?

RugnirViking4h ago

Reviewers aren't perfect, far from it. And we just gave them ~20x more code to review. Incentives mean that taking 20x longer to review is unacceptable. So where do we go from here?

throwaway1737384h ago

Review always misses something.

tcgv49m ago

> "Yep. The only people I've heard saying that generated code is fine are those who don't read it."

I review every line of code I generate with AI. I mainly use an MR-based approach:

3) Usually, by this point, the solution is ready for me to merge locally and either run local tests or do some manual fine-tuning.

agentultra4h ago

The invariant, stated informally, would be hard to prove is broken by a human reviewer in the loop. Spoken language isn’t precise enough for the task.

pronOP4h ago

> The invariant, stated informally, would be hard to prove is broken by a human reviewer in the loop. Spoken language isn’t precise enough for the task.

That depends on the invariant. Some are behavioural, like "variable x must be even if y is positive", but some are architectural, such as "a new view requires a new class".

agentultra4h ago

Yes! I was trying to make this part of my point but you definitely made it much more clear and concise.

21asdffdsa127h ago

pronOP6h ago

multjoy33m ago

It has no judgement at all.

senordevnyc3h ago

Here’s what’s working for me right now:

1. The basics: use best model available, have skills and rules that specify project guidelines, etc.

3. Don’t give chunks of work that are too large in scope. This is just art, and I’m constantly experimenting with how ambitious I can be.

marcosdumay2h ago

It's approximately the same problems, but stretched to an insane extent that you can never expect before it arrives.

i_love_retros7h ago

Don't outsource either then

21asdffdsa126h ago

How about we outsource it to pakistan and they use LLMs. That way, we do what the LLM people do - many agents and stacked on top

daishi554h ago

The generated code is more than fine, it’s good in many cases. And I read it :)

pronOP4h ago

noelsusman2h ago

I guess I don't understand how this logic doesn't apply to human developers.

1 more reply

Kiro3h ago

As soon as requirements change the abstractions fall apart and everything gets shoehorned.

1 more reply

WalterBright40m ago

My own code is contortious. I refactor it regularly to reduce that, but it still can be better.

__alexs4h ago

I read all the code I generate with Cursor and some of it smells a bit weird but is easily fixable and most of it is as good as what I would write or better.

bicepjai2h ago

stingraycharles6h ago

> Picking among them isn’t a matter of context. It’s a matter of judgment, and the models - not the harnesses - get this judgment wrong far too often. I would say no better than random chance.

I suspect this is the right direction, though, as the alternatives inevitably lead any software project to delve into a spaghetti mess maintenance nightmare.

pronOP6h ago

leonaves4h ago

What's the difference between asking an AI to write you a module you never read and installing a 3rd-party module without auditing all its source code?

Xirdus3h ago

skydhash3h ago

Trust and reputation.

I would use Stripe, curl, and ffmpeg without audits, because I trust them to provide good code and to respect their API. I wouldn’t trust AI to write a Fibonacci series implementation.

The AI has no reputation to wager for my trust.

frikk3h ago

stars on github? I've wondered the same thing.

indoordin0saur2h ago

zephen2h ago

> What I've seen is that if you define the architectural constraints, the agent writes complex, unmaintainable code...

linuxftw4h ago

Try plan mode. The problems you're speaking about are already solved.

pronOP2h ago

linuxftw1h ago

I agree they make mistakes in judgement, that's the whole point of plan mode. That judgement comes to the surface before lots of tokens are wasted without sight of the overall solution.

It's all very simple. "Use x library, data model should be xyz, do m, not n."

They're obviously not at the point of replacing an experienced programmer as far as knowing the start-to-finish way of accomplishing every detail, that's what the human is for.

hatefulmoron2h ago

linuxftw2h ago

hatefulmoron2h ago

1 more reply

jstummbillig6h ago

> The only people I've heard saying that generated code is fine are those who don't read it.

Well, that is problematic. I have to either assume you are disinterested or lying and neither is great for any discourse.

nathanielks5h ago

Yeah, their statement just isn't true. With enough instruction, I've been able to get great output from models. I think that's the key: with detailed, pointed instructions, the output will match.

rimliu4h ago

how do you know it matches? You did read it then?

nathanielks2h ago

Indeed, I'm not using LLM output without thorough review.

1 more reply

j / k navigate · click thread line to collapse