undefined | Better HN

0 pointszozbot2342mo ago0 comments

> Neither do people, yet people manage to write software that they can evolve over a long time

You need a specific methodology to do that, one that separates "programming in the large" (the interaction across program modules) from "programming in the small" within a single, completely surveyable module. In an agentic context, "surveyable" code realistically has to imply a manageable size relative to the agent's context. If the abstraction boundaries across modules leak in a major way (including due to undocumented or casually broken invariants) that's a bit of a disaster, especially wrt. evolvability.

0 comments

2 comments · 1 top-level

pron2mo ago· 1 in thread

Agents just can't currently do that well. When you run into a problem when evolving the code to add a new feature or fix a bug, you need to decide whether the change belongs in the architecture or should be done locally. Agents are about as good as a random choice in picking the right answer, and there's typically only one right answer. They simply don't have the judgment. Sometimes you get the wrong choice in one session and the right choice in another.

But this happens at all levels because there are many more than just two abstraction levels. E.g. do I change a subroutine's signature or do I change the callsite? Agents get it wrong. A lot.

Another thing they just don't get (because they're so focused on task success) is that it's very often better to let things go wrong in a way that could inform changes rather than get things to "work" in a way that hides the problem. One of the reasons agent code needs to be reviewed even more carefully than human code is that they're really good at hiding issues with potentially catastrophic consequences.

zozbot234OP2mo ago

> Agents are about as good as a random choice in picking the right answer, and there's typically only one right answer.

That's realistically because they aren't even trying to answer that question by thinking sensibly about the code. Working in a limited context with anything they do leaves them guessing and trying the first thing that might work. That's why they generally do a bit better when you explicitly ask them to reverse engineer/document a design of some existing codebase: that's a problem that at least involves an explicit requirement to comprehensively survey the code, figure out what part matters, etc. They can't be expected to do that as a default. It's not even a limitation of existing models, it's quite inherent to how they're architected.

1 more reply

j / k navigate · click thread line to collapse