undefined | Better HN

0 pointspron1mo ago0 comments

That doesn't quite work, and precisely for the reason I mentioned: You can definitely tell the AI to follow some strategy, but at some point the strategy will need to change, and the AI won't tell you that (even if you tell it to). Unless you read the code every time you won't know if the AI is following the strategy and producing good results or following it and producing bad results because the strategy has to change. This can happen even in small changes: the AI will follow the strategy even if the change proves it's wrong, and if you don't pay close attention, these mistakes pile up.

So yes, you might get good results in one round, but not over time. What does work is to carefully review the AI's output, although the review needs to be more careful than review of human-written code because the agents are very good at hiding the time bombs they leave behind.

0 comments

16 comments · 4 top-level

lukan1mo ago· 9 in thread

How do you define "bad code"?

If I instruct the AI to make small modules where I can verify they work, have tests and no side effects - then it is good enough code for me. It works, is readable and can be extended - and will turn into bad code if this is not done with care.

pronOP1mo ago

Sure, if you carefully review the agent's output, including tests, you can get good results. If you don't carefully review the output, you obviously have no idea if it's good enough for you. The only way to find out is that 30 changes down the line the agent won't be able to change one thing without breaking another, but by then the codebase will be too far gone to fix.

K0balt1mo ago

This is essentially true. There are other ways to achieve this goal though, that don’t require exhaustive human review, better models are able to do that part as well if properly guided. The key is that yes, some of the design constraints will morph over time, necessarily, since coding is as often about discovering the problem as solving it. But design principles don’t drift. If you have a design principle that can not be adhered to, it is not a proper principle, it’s an opinion about the problem.

The main thing that helps me in my workflow is to develop documentation around the code. If the code drifts from the docs, the model will notice and you can decide which was correct, the plan, the maintainer manual, or the code, or the comments in the code. Notice that there is 3 separate things written about the code, and the code itself…. Keeping all of that correct, coherent, and consistent (with a separate, invariant document that describes your design principles) keeps the model from going off the rails and gives ample opportunity to sense bad smells before they get set in stone.

It’s a token fire and you need a minimum 250k context model… but I still get as much work done in an hour as I used to do in a day, and the code I coauthor is better documented, more maintainable, and more tested than any code I have ever written before.

1 more reply

Phelinofist1mo ago

I get what you mean, but that can also happen with code written by humans.

2 more replies

deterministic1mo ago

That is correct. Using an AI to generate code and then not verify it yourself is IMHO unprofessional and should get you at a minimum a verbal warning. YOU are responsible for the code NOT the AI.

readitalready1mo ago

I let agents break things 30 changes down the line. If something breaks, I add a check to my project validator and start over, with the validator providing instructions on what was wrong and how to fix it. It's all automatic, and now I have a guard against the exact same error in the future.

Some of these checks have caught thousands of the same error, even with the latest Opus 4.7 writing the original code.

2 more replies

throwaway1737381mo ago

The concept of a small module is an architecture invariant. You’re making that decision, not the LLM. And you’ve made that decision because the machine is not good at certain things. You’re doing that because you can’t trust the LLM to make that decision on its own.

ChicagoDave1mo ago

I’m doing it because as a DDD adherent, I’ve been building software that way for 15 years without GenAI and now with GenAI I can do it faster.

You can’t play whack-a-mole with GenAI. You have to start from well-known principles and watch everything it produces. Every module or bounded context has to have its own invariants.

You can’t fully automate software engineering with GenAI. It seems the vast majority of GenAI users think they can and end up in the same place as the OP.

Maybe learn Domain-Driven Design, Event Sourcing, and then try again. The results will be dramatically improved.

https://devarch.ai/

2 more replies

WalterBright1mo ago

> How do you define "bad code"?

The harder the code is to understand, the badder it is (and the more likely it is infested with bugs).

pronOP1mo ago

> How do you define "bad code"?

Code that will not be able to evolve for more than one-two years is terrible code. Agents write terrible code while doing a truly impressive job hiding it (including in the tests they write) unless, of course, you keep them under very close supervision.

atomicnumber31mo ago· 1 in thread

I also find that every additional "constraint" you add in your context window, the dumber the agent gets, and it goes double if your constraint is unusual. To illustrate:

"Do x" - for baseline, assume this generally does X fine.

"Do X, don't use javascript". - even if X already didn't use javascript, this will often perform worse. It will perform even _more_ worse if X is difficult or unusual to do without javascript even if there is some perfectly serviceable way to do it.

Also, despite "don't use javascript", sometimes it just still uses a little bit of javascript anyway, and usually in a spot that would actually be extremely annoying/inconvenient to them remove that js yourself (when you would've otherwise reconsidered your approach at a higher level, to either use js, or to just want something different that is easier to do without js).

joquarky1mo ago

I feel like there's a limit on constraints that doesn't necessarily follow the context limits. I've assumed this is "attention heads" which I understand are an independent limitation, but I'm not smart enough to understand all the layers involved in these models so I could be wrong there.

I do observe the same thing. There are a limited number of constraints you can add and once you exceed that, you'll play whack-a-mole if you insist on all of them.

This is why I tend toward a more wu-wei attitude to constraints.

For example:

- Do I really need this constraint?

- How does the agent tend to behave in this scenario it if unconstrained? Is this behavior/result an acceptable pattern for this solution?

- Is the constraint implicitly followed often enough that I can trade spending tokens recovering from a deterministic test that enforces the constraint rather than preemptively state it in the prompt?

If I get into the situation where I need more constraints than can fit in context/attention without the need to regularly play whack-a-mole, then I break the module down into sub-modules with fewer, more specific constraints.

aldousd6661mo ago· 1 in thread

you are never going to get away from reading the code every time. at least I haven't seen how you could possibly. That being said, it is considerably less work to read and check the code than it is to have to build it all, even if you know what you're doing and have done it before.

LtWorf1mo ago

I find this to be false.

Same way teaching your child to do something is much harder than to just do it yourself.

Except the child learns.

lobocinza1mo ago· 1 in thread

This is reductive. What you're describing already happened in codebases without AI. LLM's just speed up thing because they are a great calculator and not a replacement for human input.

Forgeties791mo ago

> not a replacement for human input

Countless individuals/companies unfortunately do believe it is a replacement still and are unlikely to change their minds.

j / k navigate · click thread line to collapse

0 comments

16 comments · 4 top-level

lukan1mo ago· 9 in thread

How do you define "bad code"?

pronOP1mo ago

K0balt1mo ago

1 more reply

Phelinofist1mo ago

I get what you mean, but that can also happen with code written by humans.

2 more replies

deterministic1mo ago

That is correct. Using an AI to generate code and then not verify it yourself is IMHO unprofessional and should get you at a minimum a verbal warning. YOU are responsible for the code NOT the AI.

readitalready1mo ago

Some of these checks have caught thousands of the same error, even with the latest Opus 4.7 writing the original code.

2 more replies

throwaway1737381mo ago

ChicagoDave1mo ago

I’m doing it because as a DDD adherent, I’ve been building software that way for 15 years without GenAI and now with GenAI I can do it faster.

You can’t play whack-a-mole with GenAI. You have to start from well-known principles and watch everything it produces. Every module or bounded context has to have its own invariants.

You can’t fully automate software engineering with GenAI. It seems the vast majority of GenAI users think they can and end up in the same place as the OP.

Maybe learn Domain-Driven Design, Event Sourcing, and then try again. The results will be dramatically improved.

https://devarch.ai/

2 more replies

WalterBright1mo ago

> How do you define "bad code"?

The harder the code is to understand, the badder it is (and the more likely it is infested with bugs).

pronOP1mo ago

> How do you define "bad code"?

atomicnumber31mo ago· 1 in thread

I also find that every additional "constraint" you add in your context window, the dumber the agent gets, and it goes double if your constraint is unusual. To illustrate:

"Do x" - for baseline, assume this generally does X fine.

joquarky1mo ago

I do observe the same thing. There are a limited number of constraints you can add and once you exceed that, you'll play whack-a-mole if you insist on all of them.

This is why I tend toward a more wu-wei attitude to constraints.

For example:

- Do I really need this constraint?

- How does the agent tend to behave in this scenario it if unconstrained? Is this behavior/result an acceptable pattern for this solution?

aldousd6661mo ago· 1 in thread

LtWorf1mo ago

I find this to be false.

Same way teaching your child to do something is much harder than to just do it yourself.

Except the child learns.

lobocinza1mo ago· 1 in thread

This is reductive. What you're describing already happened in codebases without AI. LLM's just speed up thing because they are a great calculator and not a replacement for human input.

Forgeties791mo ago

> not a replacement for human input

Countless individuals/companies unfortunately do believe it is a replacement still and are unlikely to change their minds.

j / k navigate · click thread line to collapse