The Coming Loop (opens in new tab)

(lucumr.pocoo.org)

399 pointsingve1d ago276 comments

276 comments

155 comments · 74 top-level

weego19h ago· 17 in thread

What does any of that mean in practice? it's just rambling about abstract concepts that seem to be designed to hint at a bigger picture, when it's just getting AI to write code for you.

Is this where it's going? Having to mystify our roles so it seems like we're still the thought leaders when actually we're just becoming pseudo-teachers that try and herd our group of AI idiots to the right conclusion for us so we don't have to, without ever giving away that it's just all techno-babble?

kixiQu16h ago

This is not rambling and it is not abstract. The content here is about the second-order effects of "getting AI to write code for you" in less supervised ways. I'll cede the author could have polished this to more concise effect, but at the level it's at, a reader's failure to understand the substance doesn't imply there's mystification going on.

mikepurvis16h ago

I agree. I think the comparison made elsewhere in the thread to Yegge's recent AI delusions/projects is especially unfair.

Armin is actively grappling with the implications of AI-produced code on the software engineering craft, and trying to reflect on how to responsibly adopt the best parts of the new world. He's recognizing that the AI skeptical/cautious are still massively impacted by everyone else using tools that run or are built this way.

I really appreciated the piece, and I'm glad we can still publish work in progress thoughts that don't have a clear thesis and call to action.

coldtea17h ago

Once you buy into the AI hype, you babble like that. Yegge is an even worse example.

ricardobeat15h ago

I'm convinced the field of software engineering is being split in two. These are real concerns and coherent arguments, that make sense for developers who have been using "agentic loops" and heavily AI-assisted workflows.

It scares me that someone can see this as "rambling about abstract concepts" while I see exactly what the author is talking about, both at work and in personal projects; the thought that the majority of people have absolutely no idea what's unfolding.

fantasizr18h ago

tech blogs used to read like actionable readme guides. I couldn't finish it without thinking: what am I supposed to do with this information? The shelf life of the latest and greatest is about 2 weeks in the AI space. I never caught up to the ralph wiggum loop and now I'm glad I never tried.

https://news.ycombinator.com/item?id=46682325

f311a16h ago

> What does any of that mean in practice?

They want you to spend more tokens

georgemcbay14h ago

> They want you to spend more tokens

Spending your tokens like some sort of primitive meat bag is passe and any developer doing that will be left behind by the true AI natives.

What you need to do this month is set up a central planning agent that creates a 5 year plan and then orchestrates teams of subagents to each direct subteams of subsubagents to fulfill your goals.

Have each team at every level inspect each other's work and authorize each agent up and down the chain to spend tokens on your behalf for the greater good.

With any luck, the US Administration will see the light and allow AI agents to open up their own credit cards on your behalf to eliminate unnecessary bottlenecks. Agentic Fintech is the future.

We need to move beyond this early stage where you give any thought to spending those tokens yourself!

bashtoni15h ago

Practically, what this boils down to is having clear success criteria.

The harness (Claude Codex, Codex, Pi etc) keeps throwing things into the context and executing tools (as directed by the model) until the success criteria is satisfied.

The "rules" of using AI successfully are basically just the rules of any successful development team. Break things down into clearly defined chunks, make the success criteria clear and provide a way to get the right feedback on how the system is running (logs, metrics, traces etc).

bogzz15h ago

You gotta just use more tokens, more tokens means better everything.

dofm18h ago

From my past experience of religion at various levels I am very often reminded of borderline-cult religious meetings, and the zeal of converts repeating gnomic oversimplifications, and of how exhausting it was to try to engage with them on any topic of substance.

My own feeling is that it is totally OK to simply route around these people.

It's fascinating how many of the "keep your identity small" folks in the YC/HN sphere have lost any sense of perspective at the first sign of a technology that wanders into the philosophical realm. AI-oriented identities are everywhere.

AndrewKemendo16h ago

I’m sorry but this response is just absolutely ridiculous and is not giving anything near the respect to the author that you should have.

You’re just rambling and ranting about philosophical things and have basically nothing to say about the technical or engineering points that the author wrote.

This is a entirely emotional appeal and doesn’t actually engage the author where the author is engaging in the audience.

If you look further down thread there’s dozens of comments that are engaging with the content and not being hyperbolic about all this cyber shamanism or whatever you wanna call it

wahnfrieden17h ago

Why is HN interested in management and team process discussion but allergic to similar topics on how to manage agents?

It’s like saying why discuss these team workflows when it’s just devs writing code. Or why use any jargon to describe workflows when it's just devs writing code.

AndrewKemendo15h ago

Because like every other large social group it’s actually a collection of dozens of subgroups that have an overlapping interest

You’ll see those subgroups come out in threads like this and you’ll see other subgroups come out in different threads there is no singular version of hacker news that is always existing it is a collection of sentiments that from time to time align interests around specific topics

SimianSci18h ago

When someone is expected to be wizened and does not have the knowledge to keep up with the needs of those around them, they in turn become Shamanistic in their practice.

The speed of improvement on these models has been incredible and has outpaced the learning speed of humans and put many experts into these Shamanistic roles.

I think the operative means of addressing this is to recognize that we can only learn so quickly, but we are still called to improve our knowledge and understanding to a higher level. Since the improvement of these models is neither logorithmic, nor exponential, we currently occupy a space in time in which the models are currently smarter on average than we are as a collective whole.

watutalkinbout18h ago

Algorithms and data that emulate responses aren't smart.

A 5 year old knows if you want to wash your car, you need to take it to the car wash.

coldtea17h ago

Can a 5 year old write a substantial program on spec, that passes the requirements and given tests, in a few minutes?

If not, then perhaps this comparison is not the be all end all.

"A ship is useless, it can't drive over land..."

2 more replies

FarmerPotato13h ago

referring to John Woolridge's recent talk? "the car wash is only 200 feet away, should I drive or walk?"

His slide showed Opus 4.6 saying "walk". I couldn't get 4.6 to do that.

ramon1561d ago· 14 in thread

Quoting the creator of CC holds little value in my opinion. I too call my product good.

> opting out of this fully machine-driven future may not be an option.

I am contemplating whether I want to stay inside this rat race.

I completely agree with the conclusion of this blog post, by the way. I feel uneasy, and I do not enjoy the work I deliver using LLMs. I think OP did a really good job on capturing at least my current state.

meowface1d ago

I and my friends go back and forth, every day, on whether coding with LLMs is a net plus or a net negative.

I'm at the point where I think it's dumb to not do it but also dumb to do it. I have no real answer.

I have settled on using LLMs for everything but to spend more time honing the quality and cleanliness with LLM passes afterwards than I generally would have taken to write it well myself in the first place. This is in some ways the worst of both worlds, but it somehow lets me bypass akrasia while still getting pretty good code out, so I consider it superior to how I worked before. I get more done in three months even if I get less done in a day.

zahlman16h ago

> but it somehow lets me bypass akrasia… I get more done in three months even if I get less done in a day.

I think this is going to be a big deal for many programmers in the long run. "Flow" comes from feeling faster as much as it does from actually being faster in the moment. Perhaps more. And yesterday's good experience leads to today's motivation.

pdimitar23h ago

I am with you here but don't get overly pessimistic: devising hooks and stopgaps and flows and constantly tuning what to watch out for does not only improve the quality of the LLM-output code. It hones and refines your own abilities.

CC has made some pretty dumb stuff in my projects but I don't resent those occurrences. They taught me (more accurately: reminded me, because I already knew but was not applying that knowledge too often) very valuable lessons on code quality -- that's still a dark area to this day and every ray of light on it is valuable for the future programming.

To me programming with LLMs made me a better programmer. But yes, I don't just rubber-stamp PRs.

It also finally allowed me to be less of a code monkey and more of an architect and a backend lead than before. Which I was really missing.

_verandaguy1d ago

    > I am contemplating whether I want to stay inside this rat race.

I'm in the same boat. I'm hoping to go back to school in 2027 and be out of work that revolves around programming in 5 years.

I'm not enthusiastic about the field anymore, which sucks, because I used to love working in programming.

endemic23h ago

What are you going to do? Asking for a friend.

_verandaguy23h ago

I'm still trying to figure that out. Something diplomacy-adjacent would be interesting to me; maybe conflict studies, or international relations.

Based on word from my friends who work in the field, a lot of it is people who have a lot of respect for the field, and a lot of professional respect for eachother. It's also a field I feel is unlikely to suffer from the same kind of scams that are taking over software while still offering an engaging environment.

It's also work with real-world impact, which is nice, though obviously comes with its challenges.

crymeth0t23h ago

> I am contemplating whether I want to stay inside this rat race.

Same. I'm currently trying to find _my next thing_ and all anyone wants to talk about is how I'm using AI and it's absolutely maddening. It's become a lazy, lossy proxy for productivity. I've had a few intros for the types of orchestration engineering roles which are described in this post and they're just completely unappealing -- especially the prescriptive aspects. Like, the sort of JDs I'm seeing are variants of, "we want a back-end developer who has experience with XYZ but they must use agentic harnesses to do their work." Why does any serious person give a flying fuck how the end result is reached? The flip side of all this is that rates are also being driven through the floor by loop cowboys who are generating steaming piles of shit which are _good enough_ ... until they aren't. I'm being completely serious when I say that stocking shelves at Tractor Supply is becoming more appealing by the day and I also just thought to myself, "Maybe I should just join the Army while they'll still take me?"

1 more reply

Devasta1d ago

> I feel uneasy, and I do not enjoy the work I deliver using LLMs.

I have basically stopped writing code in my spare time since the advent of AI. Before I felt like I was working on a classic car. Was it a practical use of my time? No. I could go out and download software that did what I wanted. Did I have fun doing it? Yes, the act of working on it was important, I felt I was still learning and improving as I did.

Nowadays I see people doing far more in a month than I could in a year and I feel like its all a waste, like I just spent the past few years transcribing a phonebook while standing next to a photocopier.

I don't know if that'll ever change. I can't even pretend I was doing something prestigious and artisan like watchmaking because I wasn't a good programmer beforehand.

enoch_r1d ago

This piece changed how I work with LLMs and made me much more optimistic about how "fun" it can be to work with them: https://nolanlawson.com/2026/05/25/using-ai-to-write-better-...

Before I would just throw prompts at the LLM and it'd end up building a pile of crap (but semi-working crap, and 100x faster than I ever could) - it was pretty depressing. Using tools like `grill-me` (or `grill-with-docs`) I feel like I'm actually building my understanding of the system and helping shape it, and the results are much better.

warmwaffles23h ago

The fun part about that `grill-me` command is that when the questions are over, I've found that I can go right into implementation without needing to dump a PRD or some sort of broken up plan. Now this is obviously completely predicated on what you are asking it to grill you on. But for tasks that are semi complicated, it's fantastic.

cavoirom1d ago

Be your customer, write the software just for you, AI is so effective that you could do something meaningful for you just in spare tine.

Here is the similar perspective: https://isene.org/2026/05/Audience-of-One-Numbers.html

I was misunderstood you if you intend to write code by hand, I still did, I use AI to learn by example, but I write the real code myself, AI can help me improve the code. I learned a lot.

xpct1d ago

I used to think I'll be into coding for the long haul, contributing to open source, and working on multi-year side projects.

Nearly all of that passion vanished this year, and I've been struggling to replace it. I know I'm much better than the machine now, but the lines are starting to blur, and some of the small puzzles of day-to-day have been completely automated away.

We've birthed a lot of puzzle solvers that enjoyed programming, and I'm sure many of them will move on to something else that scratches the same itch. I'm keen on learning what that will turn out to be.

fartcoin671d ago

I'm the opposite, couldn't be bothered to work on code outside of work. Barely did at work because I was more focused on wrangling a small army of shitty contractors (thanks strategic partner initiative for firing all of our small shop contractors and replacing them with morons from "offshore").

Now with LLMs I find myself doing small projects that interest me or have some utility for me outside of work, and doing a lot more development in the codebases at work outside of just review/docs/arch than I was before. Also making small tools that I find pleasant/useful but were not important enough to spend time on before.

foobar100001d ago

Agreed - there was always a set of things I wanted to do that I knew the magic core for, but wanted a team of implementers for the curft, the 100k of actual testing harnesses, hyperparameter exploration, etc.. . I now have that team of implementers. All the problems seem research-y though - optimal binary transport systems that are zero-copy and compatible with languages, fast physical simulation optimizers, etc etc... So, things that all had a _LOT_ of busywork around the magic core.

1 more reply

boscillator1d ago· 7 in thread

> the right fix is not "handle every malformed case." ... [LLMs] will still attempt to handle now impossible errors.

This is the number one code smell from LLMs and I don't know why they are so obsessed with it. In python, it often comes as `hasattr` checks on types that are defined to have that attribute, in a code base that is fully type-checked.

Why do they do that? Is it from pre-training or re-enforcement? If that latter, can the labs please fix this?

rzmmm23h ago

Likely just that they err on the unnecessary error handling than missing error handling. They likely penalize runtime errors harshly in the training

jerf23h ago

I suspect it's mostly the training data. I am also on team "make illegal states unrepresentable". It may get talked about a lot on HN, but I'm still at the point that I'm surprised when I see a code base that I didn't write in the wild that does a really good job of it, either open source or at work. Most programmers still think in terms of picking up pieces and fixing errors at the point where the error message pops out rather than making it so the error can't happen and the data reflects that.

I say "mostly" because I think there's also a problem with AIs thinking this way in their current state. That last level of human understanding of a code base, where the human holistically understands the flow of those guarantees, is a challenge to give them right now. On the raw code level, this sort of thing often involves enough code to easily blow out their context window. Trying to summarize it in memories-style files has its own problems; just because there is text written down about the guarantees doesn't mean that the AI is going to get the right info out of it, any more than a human might from just reading the code. I won't say it's "impossible" to give an AI this understanding because I'm not sure it is, but it is a level of understanding of the code that even if you get them to have it, their practices tend to fight against it.

My own solution to this problem has largely been to give up on them getting this. I prompt a solution to the problem the way that most people do, then if I want to make bad illegal states unrepresentable I prompt the AI through the process of the necessary refactorings, unless it's so small that I just do it myself. Given a lot of code that uses maps/dicts and arrays and strings and ints, if you prompt it through making those more thoroughly typed, it's actually pretty good at it. I've not had a lot of luck getting good designs out of single prompts, even when I get detailed. Treating it as two separate tasks seems to work out well.

And watch the diffs on the types carefully; AI loves to sneak past a ".JustSetItAndIgnoreAllThePreAndPostConditions(string)" method. After all, I suspect there's plenty of training data of "types that are nicely structured to make error states unrepresentable and then a later maintainer came along and added a 'JustEffingDoIt' method that broke everything" in the field. One of the best defenses is to make sure that the type implementing these things is in its own file and you can easily look at all the methods it adds on those types and smack it when it does that. I've tried slathering warnings about not doing this and explaining the pre- and post-conditions being maintained in the docs but the change seems marginal.

ambicapter19h ago

Because the vast majority of the codebases in its training set aren't fully type-checked, or very clean at all. Or it's just snippets from Stack Overflow, so there's no existing context to not assume null-checking is valid.

efromvt18h ago

million times this - getattr on every dataclass is a wild choice

skywhopper23h ago

It’s because it matches the patterns they are trained to follow. They don’t understand the code. They can’t reason about the actual logic flow. They can only work with patterns.

CuriouslyC23h ago

Sorry to say but the solution is to stop using python. The models are trained to code defensively assuming historically representative python codebases. The models trust the types a lot more in languages where the canonical historical examples trust the types because the language is constructed around that premise.

zahlman16h ago

I would expect a language model to do a better job of coping with that kind of uncertainty, inferring type from name and usage, etc.

yanis_t1d ago· 7 in thread

I keep thinking about at which point I should not force myself into the loop. As a developer I really like working on the code structure, making it clearer, thinking about good abstraction, breaking into modules, etc. I really take pleasure in it. At the same time I understand that at some point I am becoming the limiting factor.

If the point of the software is benefit people, should I still care about how the code looks.

Right now, I still think that the answer is yes, but in 3 years? in 10 years?

wartywhoa2317h ago

> If the point of the software is benefit people, should I still care about how the code looks.

The answer is yes you should, as long as you want to keep software benefiting people.

steezeburger19h ago

It's tough if you're somewhere that isn't very meaningful to you beyond the technology. I think there will be an existential shift soon towards more fulfilling work. Maybe I'm naive or that's just what I feel I need for myself.

cadamsdotcom23h ago

You will always be able to ask the agent to do refactors for you - and it can do mega ones that exhaust you to think about!

yanis_t22h ago

Two problems: I will not get my pleasure, and I will still not know how the code works.

datadrivenangel23h ago

Agentic refactoring is very questionable if you want to maintain quality, as it will rewrite all your code to be more average.

zahlman16h ago

I've found that if you look over the code and notice and describe a specific problem and solution, the agent can apply a refactoring for you well enough; and that's often faster than editing the file yourself even if you already know exactly what to do.

The idea of setting up an agentic loop to review code and propose and implement refactorings still seems pretty awful to me, though, yeah. Maybe cut that off at the first green-bar revision, and then apply some actual taste and judgement.

goatlover17h ago

Or you can do it yourself. Use AI as a prototyping tool or for boring throw away tasks. There's no reason everyone has to succumb to vibe coding.

camillomiller1d ago· 6 in thread

Show me the billion dollar solopreneur startup, or the profit increase for companies and at that point I’ll start thinking that this tasteless high level wanking might make sense in some way

soulofmischief1d ago

We've just invented the car and you're upset it hasn't achieved Mach speed yet.

bee_rider1d ago

One car went Mach 1, ever, apparently. Anyway, I don’t think the analogy fits. Ford or whoever didn’t loudly and frequently predict Mach 1 cars, right?

The situation is more like: Altman & co are predicting their new car will replace all vehicles: horses, trains, planes, motorcycles, there’s a real possibility the concept of vehicles will not exist other than cars, in the future. Meanwhile it hasn’t really done highway speeds yet. It does some impressive runs on curated tracks, and people use it around their farms (it seems to work ok for some of them).

We’ll see, I guess.

soulofmischief23h ago

Yes, one car did Mach 1. And the first production car, the Benz Velo, could only go 12mph. It's an apt analogy.

As I mentioned to OP, applying future aspirations to the current space is incorrect. Some people are able to understand the progression of industrial automation, some people aren't. But if you look at the current batch of frontier models and say, "I just don't see how this is going to be useful", then you're in the camp of those in the 80's who didn't understand personal computers, or in the 90's who didn't understand the web. In hindsight, the technologies evolved massively and found routine use cases that no one initially predicted.

1 more reply

camillomiller1d ago

It is a terrible analogy that shows terrible thinking. After all, there's one thing we can bet with more confidence on: delegating thinking to this mediocrity machines is affecting the ability to do the same in scores and scores of previously smart people.

2 more replies

camillomiller1d ago

I was literally quoting Sam Altman

soulofmischief23h ago

No, you mentioned his discussion of the "billion dollar one person startup", which we can both agree is a fanciful idea and more of an eventual "possibility" that will of course not occur as once anyone can be a billionaire, the whole system is going to change.

However, your "tasteless high level wanking" is not a quote from Altman, it's a vague and directionless insult that manages to sweep quite a lot of legitimate discussion about the future of automation and professional work under its thumb.

It's wrong because you're saying, "where are the billion-dollar one-man startups?" in the same way that I might look at a Benz Velo and go, "But it's so slow! Horses can go faster than that! Everyone saying cars are going to change the fabric of society are just tasteless wankers!"

The point is that you are applying future aspirations on the present-day relatively brand-new model space and getting upset that we aren't there yet.

1 more reply

mmillin1d ago· 4 in thread

>Yet even with a lot of manual steering, that type of code does not come out of LLMs naturally, and even if the code comes out naturally like that, they will still attempt to handle now impossible errors.

This is something I’ve struggled to fight against in many PR reviews. Especially once already written, convincing someone that their excessive null checking is harmful is an uphill battle. Short of better modeling (and languages that allow for sum types to enable it), I haven’t been able to come up with a universally convincing argument against this kind of “shotgun parsing.”

Maybe it really just isn’t that big of a deal? But when actually reading through and refactoring a codebase I’ve always found it frustrating to manage these unnecessary checks. Sometimes they’re nearly impossible to delete safely once present without first adding some kind of logging or broad investigation.

datadrivenangel23h ago

And AI code reviews encourage overly delusional defensive paranoia. triple null checking deep inside a function is technically a real risk, but in practice should never be hit because you've checked for nulls in every function that calls or could call the function in question and is thus not necessarily worth guarding against.

zelphirkalt15h ago

Are LLMs too dumb to understand the type system? Or is the type system too bad, to represent non-nullable?

Even in Python with its meh-ish optional typing system something that can be None is different from something that cannot be None.

handoflixue22h ago

How impossible are we talking?

I tend to be a fairly defensive programmer - maybe nothing currently sends this function a negative value, but how hard is it for a future code change to alter that assumption? I always figured a clear error was best. It lets even someone unfamiliar with the code know what assumptions are being made about the valid range of inputs, so they don't have to consider impossible outliers.

mmillin21h ago

Obviously it isn’t totally impossible, but it becomes challenging to know if it’s required or not. It’s hardest when it isn’t just throwing an error but instead defaulting to something only half-sensible. For example replacing a negative number with 0 or overflowing rather than panicking.

When it comes to assumptions about the input, ideally model them in the type system. If you can’t, explicit checks and throws are OK in my book. But don’t check-and-hide any errors. You’ll be hard pressed to debug the issues they’ll cause down the road, since it will usually be far from the implementation that you see the impact.

wseqyrku19h ago· 3 in thread

For some reason pro-ai blog posts feel like paid ads, I might be wrong.

tfrancisl18h ago

I can't blame you when the first few sentences almost always evoke one of the "creators of XYZ" (I don't know how you can say a model or model harness has a singular creator when the model was trained on everyone's data and the harness was built by a whole team?) and treats their word or experience as gospel.

Who cares what Cherny thinks? He is selling his product, and he will probably cash out soon enough while his credibility is as high as it is.

rolisz18h ago

Armin is very nuanced and balanced. He spells out clearly in the blog post that bad parts of AI

shikshake18h ago

To me there’s usually an undercurrent of manic zeal that makes them feel that way.

joenot4431d ago· 3 in thread

We've had great success with agents thus far at my job. A year into Clauding and all our dev metrics are up while our downtime has remained steady.

Being an iOS engineer, much of my engineering cycle these days is going from Figma/PRD → spec → code. After being handed off to QA, we handle the bugs and product slips as they come through, while we simultaneously build/spec the upcoming addition. This is basically the same agile style that's been popular for 20y, just super-powered with agents.

How might someone accomplish the same goals using loops instead?

Jcampuzano21d ago

I personally have not had good luck with loops due to similar issues as the post author - but if you were to port your flow to "looping" it would be something like:

- An automation that periodically checks for PRD's at a given location that have not yet been implemented.

- If it sees one not implemented, it puts a lock on it (so other agents later don't pick it up while its still working) and implements the PRD in code, assuming it has the figma link and all specs required.

- When its done it makes a PR, waits for if it passes and even in some cases automatically merges into your staging/preview enironments and just pings you with a build/URL. You can then leave feedback or something and it can also also poll for pending feedback. Or you just mark it looks good, the agent then merges the PR, moves the PRD to implemented status, maybe even writes/updates docs and cleans up any temporary work.

- Repeat checking for new PRD's every T unit time. (10 minutes, 1 hour, etc)

This is how people say you should be looping - you never even cared or looked at the code, and also never prompted the agent yourself.

But I find most agents are often pretty bad still at replicating UI vs making something from scratch and most design specs are still not as detailed around how things look at all sizes, in all scenarios etc. Design seems to be one of those things that still requires a human to validate. And then all the things the post author mentions about it not being willing to apply hard constraints, minimize impossible states, validate at edges and prevent horrendous overchecking of things. etc.

camillomiller1d ago

Would you have a breakdown of costs/benefit? Can you say with certainty that this workflow has increased productivity so much that you are seeing profit increases that you wouldn't have otherwise noticed just by hiring more people? Asking with no ill intention, I just crave for actual business cases that make sense, and yet no-one seems to be able to reliably produce that.

noodletheworld23h ago

Use appium or XCTest or swift testing; generate the tests first (failing) from the spec.

The loop is basically then a while loop:

While (tests fail) { trigger agent: spec, failures list }

for bugs, write failing tests.

Its basically TDD.

Loops do nothing useful beyond making the “spec -> code” step more “hands off” and let you be confident that the code you write does what is intended.

Obviously you see the issue: writing the loop harness is > effort than not having it…

…but the idea is that you run “spec first” and are totally hands off on the code, just updating the validation step and then waiting while the agent iterates over and over to solve for some solution that passes the loop harness.

People suggest that it is possible to go, eg. directly figma/jira to harness via (random tool here), saving even more time and invoking even fewer humans, but thats currently, as far as I can tell, actually just hype.

No one is actually doing that effectively.

Loops are currently carefully hand crafted, which makes them tedious and of questionable value imo.

CraigJPerry1d ago· 2 in thread

> My current status is that I have not had much success with this way of working for code I deeply care about

If something is judgement heavy, "code i care deeply about", then i don't really agree with the direction of travel here. Don't try to delegate decisions you care deeply about.

I do like the framing of agent loop vs harness loop, but only delegate stuff that you can accurately specify in advance, that usually means stuff that's repeatable in my case ("hey go see how i did X, do that but for Y"), and that inherently means stuff that's predictable.

For stuff where lack of my judgement as input is just going to cause me to say "no", we're down to collaborating in the "agent loop" as Armin puts it. And that's totally fine. It's fast, but also safe.

Remember before AI coding assistants, sometimes you'd get an engineer join your team who was SUPER productive, your peers would be jealous "oh yeah but you guys only got all that done because you have X on your team!" - they didn't live the curse of having that kind of person around - if you don't have them PERFECTLY aligned, then they run off at break neck speed in the wrong direction.

otto-riz15h ago

> Don't try to delegate decisions you care deeply about.

YES. Or find a deterministic way to insert them :D

zahlman16h ago

> Don't try to delegate decisions you care deeply about.

> they didn't live the curse of having that kind of person around - if you don't have them PERFECTLY aligned, then they run off at break neck speed in the wrong direction.

Exactly. If you wouldn't outsource it to people you considered highly skilled, why would you outsource it to a machine?

gcanyon1d ago· 2 in thread

I'm a software developer from way back, using tools and languages that coding agents are far less familiar with.

So when I use an agent to write code, it's in languages I'm less familiar with, and often using libraries I know nothing about.

All to say, my part of the process often ends up being:

1. "Here's what I'm looking for, in detail" 2. "That's not right. Here's one way it's not right, and a specific example. Please fix that." 3. Sometimes I give suggestions for how what is going wrong might be happening, or conceptually how to work around the issue. 4. And iterate on 2-3 until the result is close enough.

That's a loop I'd love to automate.

timmytokyo17h ago

Sounds like a great way to avoid learning anything new about the languages you don't already know.

handfuloflight18h ago

Have you tried SKILL.MD files encoding your nuanced domain knowledge?

livingsoft13h ago· 2 in thread

The author is spot on about the paradigm change of software as a lifeform. Living things provide us with genuine interactions and experiences of learning and growing, without forcing us to understand the code - You can learn to work with animals and plants without understanding their genetics at all. I believe this is how our relationship with software must develop, and in order to get there, we'll need to learn to design and develop software in a completely new way. I've been testing this hypothesis in my spare time, hacking together a server-browser system I call Mycelium. It's a bit like OpenClaw, except you can use it to create private local Webs, and print custom 2D Electron browsers to view and work in these webs.

andai12h ago

Interesting, could you share a link? (Your submissions appear to have been nuked.)

Edit: woah!

https://www.youtube.com/watch?v=NGgve4L2zY4

livingsoft12h ago

Yes, you found it. Blancs is the browser prototype for non-technical users, but the full system, Mycelium, I'm currently working on towards demos and distribution. I'll be sure to update the YT channel, if you want to follow the progress. I'm also on Twitter @livingsoft_

abathologist23h ago· 2 in thread

Generally interesting reflections here, yet I see the same kind of myopia and fatalism that is rampant in our (fashion) industry:

> yet I have no doubts that this looping future is going to be our future despite the fact that I presently resent it

Why would anyone concluded this? LLMs are just one kind of application of MLs to software production. There is a vast solution space for automating parts of software production. The idea that slop loops are the inevitable future because they happen to be accelerating output at the moment just seems profoundly short-sighted and lacking in vision.

mohsen122h ago

I'm really curious to see how this unfolds. It's a defining moment for us I think

Terr_18h ago

"It is inevitable, soon all insulation will be asbestos, you have to learn the types in order to be competitive in its new future."

johnwheeler18h ago· 2 in thread

What is a loop in simple terms?

zahlman16h ago

Pretty much the same with LLMs as in other contexts. You set up a system where LLM output is used as input (a new prompt) as well. Or you have an external system that repeatedly prompts an LLM to "continue" with what it was doing before, perhaps examining some test results to massage the context window first, etc.

wartywhoa2317h ago

The Loop.

The Singularity.

The Nobody who runs The Show.

The Acceleration toward inevitable Transcendence.

Revere, repent and obey! Compute is Destiny.

codeDruid1d ago· 2 in thread

Yeah I don't know. Don't get me wrong, the article points makes sense. But sometimes I think that we're going to stay near this current point of productivity for a little while.

Currently my org of 8 people use around 1000 euro worth of tokens per month. We've recently had a discussion near the water-cooler, that if the cost climbs 5x-10x it may be just more worth it to get more developers (we're EU based). While the tools work and are definitely nice, even in our little org with our little budget, using Opus 4.8 we've noticed code quality going down.

If I had to bet money, I'd bet that the models will get 30-50% more nice, around 2x more expensive and we will settle into some mode where we'll use llms for some tasks, manually doing others and calling places focusing on speed at any cost some funny name like "gulags, 996, sweatshops, etc" and collectively try to somewhat avoid those places, which will need to offer a premium to attract talent. Wishful thinking.

erispoe1d ago

How do you control code quality?

baw-bag23h ago

The way I answer this question is, I don't because my boss is saying to basically accept whatever LLMs spit out which is fine by me because I have explained the risks of doing this to our only product and it was shrugged off. I am both disgusted and excused and just take the salary.

mccoyb1d ago· 1 in thread

Loops work when you spend the proper amount of time to understand what you want ahead of time. The prerequisite is clarity — enough clarity that you could write a careful specification that you could hand off to a junior colleague.

Often, it takes 5-6 broken crappy versions of a thing until you understand that. There is no accelerating the 5-6 broken crappy versions - there’s no agent tech that’s going to help your meat brain avoid thinking time.

So most of my time is iterating between these two phases: I don’t understand what I want, I need to read and write and play with code, okay it’s been long enough I think I know what I want (it is extremely easy to deceive yourself) … okay now I do actually know what I want and I can write a loop.

Many people think they can jump ahead with agents. You cannot fake understanding or clarity. It is painfully obviously when someone skipped that meat brain understanding phase.

athrowaway3z1d ago

I had codex write a tool to extract all my pi sessions. (Had to filter out my prompts from the agents talking to subagents).

Then I had it analyze the patterns i was making and turned that into the flowchart for the outer guidance-creating-prompt.

I didn't have to spend too much time thinking what i wanted. I wanted it to do that.

The result is still mixed, and i'm not trusting it with delicate code bases, but for a game i've been building i dropped my check-in time to 1/5th i was previously spending on it.

Thats not a good thing per-se. I'm sure i'm missing good ideas by _not_ spending time with it. But previously I really had stagnated with my prompts becoming mechanical #now-do-this and #now-review-that with 90% of its suggestions being correct.

Just need to (automatically) remind it to "do the hard stuff first, clean up & refactor as you go" as well as a "reflect on your work" after its first return to get it to spill the beans on any crap left behind, and then process that in the guidance-creating-prompt to dish out new work.

miki12321113h ago· 1 in thread

This ties into something I have been saying for months: LLMs are great at finishing tasks, but bad at aesthetics and taste.

There are two kinds of work: One is goal-driven work, where we have a goal to achieve, and we care very little about how we get there. Security is a perfect example; if you want to exploit a system, you rarely care about how beautiful the exploit is, all you want is access to those super secret nuclear plans. Research is also like this; "research-quality" code was famously terrible, even before the age of AI.

The other kind of work is taste-driven work. People think that, when they're adding a feature to a large codebase, their goal is to add that feature, but that is often not the case. Keeping the codebase amenable to future changes is often far, far more important than this specific feature, and that requires taste. Note that maintainability and code quality aren't synonymous, code quality is just a means to an end, and that end is maintainability.

jr359213h ago

> Note that maintainability and code quality aren't synonymous, code quality is just a means to an end, and that end is maintainability.

Many orgs are quickly moving to a world where code quality and maintainability are not a priority, at all. If claude is just going to write the code, does it matter "maintainable" or "quality" it is? No. It just matters if it works, and if its fast, is how the perspective goes.

galaxyLogic17h ago· 1 in thread

I think what is going to happen is revival of "Methodology".

"Methodology" was a big thing in the past just before we got into "Agile Extreme Coding", instead of trying to model the big picture of SW development projects just jump into coding agilly. Implement it feature-by-feature

Granted the methdologies proposed ( See: https://www.ibm.com/docs/en/rational-soft-arch/9.7.0?topic=m... ) may have been too heavy and not flexible and not improved enough. But now with the rise of Agents I think we need to revise and perhaps re-invent them for AI agentic development.

whazor16h ago

Yes, I believe in reinventing methodologies for the rise of agents.

Before even discussing with my team whether something is a good idea, I can have a full prototype in a new branch within one morning. But then use it as a proof of concept and delete the code.

Another example: agents can generate full test coverage. Including mocking external dependencies and user behavior. Also all in one morning.

Why reinvent methodologies? The cost functions have changed. And agents add new problems, such as hallucinations and tech debt. I think reinventing is a good word since these processes already exist. But the emphasis would shift toward particular parts, making some of them much more mandatory.

CuriouslyC23h ago· 1 in thread

Part of the problem is that models don't have a strong sense of taste, part of the problem is that the context in which projects exist is incompletely represented in the LLM context, and part of the problem is that LLMs tend to be myopic.

The lack of taste can be mitigated to some degree by improved training, though taste is not a stationary distribution in humans (see trends/fads/etc), we can at least better track the cutting edge. I think this area still has low hanging fruit but frontier labs are more concerned with being able to solve problems than the style of the solution right now (for evidence of this just look at the Opus 4.5 -> 4.8 arc).

The problem of incomplete context is partly a human problem and partly a harness/interconnectivity problem.

LLM Myopia is a harder problem to solve just by virtue training models on question/answer pairs. Countering this requires emphasizing RL on solution paths rather than just prompt/response, which is doable but harder.

zahlman16h ago

> The lack of taste can be mitigated to some degree by improved training, though taste is not a stationary distribution in humans (see trends/fads/etc), we can at least better track the cutting edge.

The point of "taste" is not to copy from others, even if you can somehow filter out the trends and fads. And really, that filtering comes from personal experience anyway.

artisin23h ago· 1 in thread

This.

> Present-day models tend to produce code that is too defensive, too complex, too local in its reasoning. They avoid strong invariants. They add fallbacks instead of making bad states impossible. They duplicate code, invent bad abstractions, and paper over unclear design with more machinery. Worse though: I so far see very little progress of this improving.

Context-smithing can help to a degree and cyclomatic-like complexity rules tend to make matters worse. So, you either roll up your sleeves or close your eyes and hope for the best. I've had limited success with the latter.

handoflixue23h ago

I was really surprised by this part!

I've run into the issue a lot; I know it happens. I handled it manually for a while by just having a fresh instance inspect the code - "review this for DRY violations" and "how would you re-write this into a global architecture instead of a bunch of local code".

Eventually the list ended up long enough that I've got an agent that handles it. You've just got to treat "write code that works" and "write elegant code" as two separate tasks - either a fresh instance or an Agent will work

inline_always17h ago· 1 in thread

The bottleneck has always been the 'verification' and 'trust', that's why we have senior engineers, same way you need a head architect sign-off on a blueprint, because when things go bad you need a human agent to be the responsible party. Even if we manage to teach a herd of dumb AIs to produce massive amount of code, who's going to trust that output with their life?

wahnfrieden17h ago

That's the entire topic - loops are not just infinite output, they require automated verification and progress evaluation steps.

The game is to find ways to automate that. Not fully but yes to reduce what's required from humans. Seems like you're questioning the entire premise rather than pondering how far it can be taken and how.

rcarmo1d ago· 1 in thread

There's _way_ more than one way to do "loops". I just asked one of my superviors/auditors to document how it's been working while monitoring a few other agents that have long-term goals:

https://gist.github.com/rcarmo/4922b550ab48bf0b4246c77e606a5...

rcarmo23h ago

I'm glad HN prefers downvoting actual work/demos to discussing them.

JodieBenitez1d ago· 1 in thread

> For now I have not moved past the point of comprehension being important to me.

Ah ! This is me too... at least for what I have to ship at work. Not so much for my toy/weekend projects. But it turns out agents are also good at explaining.

jazzypants1d ago

I think it's insane to suggest that software developers should ever get to the point where they don't even comprehend their code.

Before someone else says it, no I don't read the assembly code that is produced by my compilers. However, I can generally predict what kind of assembly will be produced, and the result is deterministic unlike LLMs. It seems like most vibe coders scoff at the idea of even looking at the code, and it just seems untenable to me when we're working with (usually correct) stochastic parrots.

1 more reply

gavinh11h ago

> We may create codebases that are not merely hard to maintain by humans, but that assume machine participation as part of their maintenance model... People more and more merge code they cannot fully explain. People lose their ability to create issue reports or discuss things in chat, without augmenting or rephrasing their messages with the context provided by a clanker. Too many people increasingly rely on a machine to summarize or contextualize it. More and more do I encounter people who converse with me through the indirection of an LLM.

I experience this daily now. It find it discouraging and concerning.

I believe we're merging more code we can't fully explain because we are now relying on code review to build the mental model that was previously built by writing code and collaborative technical planning. I don't think code review is fit for this purpose. I do think we can extend code review with structured exercises, informed by pedagogy, that strike a better balance between friction and understanding. (I'm looking for help testing these exercises).

5 more replies

stillpointlab15h ago

My experience is that I am bottle-necked on specs. The agent loop is less of a thing for me now.

If I can get a clear understanding of what I want to build, communicate that to Claude Code in planning mode with the goal to write an actionable spec (not code, plan to write the spec) then I tend to get very good results once the agent goes to implement.

But this strategy, while effective, puts a big load on me to write the specs. The agent tends to knock each one out of the park (usually 2 to 3 follow ups based on code review) but then I'm back at the stage that requires the spec.

Another issue for me is that when I step away, if the agent finishes a task and could technically start on an existing spec (no overlap on files so no conflict possible) it doesn't know it can just create a new branch and start. Before I go to bed I'll often say "do task X and once done and pushed start on task Y". But I haven't had luck beyond that. Often I find that it starts on Y and has a question and then the agent is idle the rest of the time.

The final issue is dependency coupled with the above. For example, today I was writing a background job processor. Obviously, the jobs that are in subsequent tasks require the system. That happens with some frequency. Even the specs need to be refreshed after the implementation to take any details that were resolved at coding time into account.

But I am just on the cusp of wanting the outer loop. The gate is almost entirely on spec creation and PR review. In places where those gates don't matter, I want the agent to keep chugging away.

As an aside, I strongly believe we need to start using tools that are better for LLMs even if they are worse for us. For example, Rust is annoying because the compiler is so strict. Bad for me, great for LLMs.

7 more replies

Multicomp22h ago

Code is part of a shared and built understanding of an information system.

If these loopers mean we all have to move at this continuous wave of software happening, then we get to the highest levels of logical information system design and its all human judgement and balancing of business requirements to fit a given niche in a company or market. So all the programmers have to become business analysts/market researchers/businessmen...except the specific niches where AI tooling can't really clank well...or the end of the subsidized AI token era makes all this looping too expensive to continue. This feels like expert systems and symbolics lisps machines redux, where we briefly ran into the fact that its not so much the code itself not being able to do stuff, it's that your company's org always gets shipped, so if you can't change your company org, your software only has so much flexibility.

Dataflow diagrams and domain knowledge / domain modeling / ubiquitous languages may become the metalanguage that we start to use and set the standards for quality, functional, and non-functional standards and conventions. We make the "looper clankers" ensure that they fulfill that data / behavior / performance contracts before saying what "done" is, because "done" is no longer just code that compiles, code that builds, code that deploys, or even code that sits in production; it's code that fulfills all of the user requirements, operator requirements, and maintainer requirements. So, the language used may be required to make us all turn into business analysts and software architects more than syntax knowers. The revenge of UML and the return of declarative / logical design / BDD triumphing?

(Typo scan by gemma4-12b but I didn't let it alter my message)

tmshapland17h ago

Thank you for writing this thoughtful post, Armin. I find it deeply comforting that the developer of Pi, an agent harness, does not remove himself from the loop, like me. Maybe if I started thinking of codebases as biological organisms I could get comfortable with getting the human out of the loop.

wolttam22h ago

I am 100% for fully agentic loops... for tasks other than engineering.

I'm not willing to outsource the understanding how things work part of myself. That part of myself is what got me into computing in the first place.

If this work becomes simply a matter of describing intent to a machine (probably through an Issue, like a user), and going to check on the result when you get the 'done' notification: I'm done.

It's possible to use the tools to do awesome things without letting go of full system understanding of the parts that you look after.

1 more reply

piker1d ago

We used a “loop” before it was called that to drive MS-DOC support into Tritium. Based on that experience, I take issue with this:

“There are already impressive examples of large automatic porting efforts, including the reported work around moving parts of Bun from Zig to Rust.” (Emphasis added.)

It will be impressive if/when the Bun team is able to pick up and continue extending and supporting Bun. For us, MS-DOC remains read-only and probably perpetually buggy until we reimplement with a better understanding. Until then, it’s definitely not “impressive”. Functional? Maybe. Impressive, no.

contagiousflow23h ago

> You Cannot Quite Opt Out

I am so over this. I cannot take anyone seriously that claims inevitability of their ideas, and how you must adopt them without "being left behind". If these tools are so good and so capable the result should be able to speak for themselves rather than this FOMO inducing, emotional language.

5 more replies

sixhobbits17h ago

I have huge respect for Armin but all of the concerns about agents producing more code with less competent supervision from senior engineers doesn't seem that different from the status quo to me. A vast majority of all software I've ever professionally worked with has been terribly structured, hard to work with, full of bugs, etc, produced by mediocre to bad engineers and run by semi technical product owners or managers who basically promot the software into existence by making jira tickets on 2 week cycles to hold it together.

Yes it's awful, but it kind of works and has worked for a very long time. Agents are already improving a lot of open source software. Yes they're producing a lot of slop too, but having beautiful code, understanding how the system works and being able to delegate to a competent engineer you trust is reserved for the very few right now and I think we have all the systems and experience in place to deal with "bad" but working software so personally I am not concerned

dataviz100018h ago

I’m having awesome success working with recursive agents. I discussed my experience with them. [0]

> Claude's attention doesn't distinguish between "instructions I'm writing" and "instructions I'm following" -- they're both just tokens in context.

It takes a little human help in the first iterations but after a while it will start to iterate and improve unsupervised.

[0] https://github.com/adam-s/agent-tuning

furyofantares19h ago

I have had some success with /goal for long tasks that can be set up in a way that the agent can do good work for an extended period of time.

A lot of tasks aren't amenable to that, and the ones that are still need a lot of care to be set up correctly. The default vibe coded codebase won't be.

I've come to think of the activity of choosing the right technology, the right architecture, the right testing setup, the right context, and the right /goals to use as programming the agent.

1 more reply

lifeisstillgood17h ago

>>> For now I have not moved past the point of comprehension being important to me.

I see software as new form of literacy, even in the AI world, so yeah in my world view, comprehension will be something we always cling to.

I might comprehend some code the way I comprehend the newspaper article on the second page, others I comprehend like a Dylan Thomas poem. My attention might be different but I still need to understand it.

theahura8h ago

not to be dismissive, but isn't this the same as the ralph loop thing everyone was talking about 6 months ago?

agumonkey20h ago

I can't help but be tired of the LLM trendy, where people bang at loops until they hope the model sculpts something. It feels so empty mentally to just have results without constructing it.

That said the idea of loop has always been there (iteration, V cycle etc) but I'd be glad to find people with more theory and less agents swinging blindly so to speak.

illuminator8314h ago

I think a lot of people here have either not read the article fully or are misapprehending it.

Neither this author nor most other sane people I know claim that the code or architecture these "loops" produce is great. In fact, the author explains how it is not great. His point is rather, that we'll increasingly see a world in which code quality and maintainability by humans will cease to matter for a lot of codebases.

There might be many software companies in the future which successfully sell software products which were created without a single software developer being involved in its development or maintenance. The code might be bloated and bad - but it doesn't matter because machines can still create and maintain it cheaper and faster than people can.

I already see this happening at a small scale at the place where I work. Product managers with zero coding ability are attempting to create entire new product features on their own using Claude or Codex. We do not let them merge this stuff unsupervised but in some corners and in new repositories they are publishing stuff that they have barely spoken about with a developer. They are just doing it. We'll see more of that.

raidicy16h ago

Hilarious that Jeremy Howard and Rachel Thompson have been saying you need a human in the loop from the beginning.

And now finally we discovered that we need to put a human in the loop.

stuartaxelowen17h ago

This blog post pints to the fact that you need information across scales to make really insightful products and software. You need to understand fundamental mechanisms, strengths, and risks of your software to know where to make bets next. You need to know about the “how” of your optimization system to know which customer asks to deny.

Using layers like the loops described here to abdicate your work is you decoupling from the joint market/engineering value you originally provided.

otto-riz15h ago

> Ideally all of that was also well documented. Where that understanding was lacking, it was generally regarded as something to improve upon.

In my experience, this was always an unrealized ideal. Too much pressure to ship, not enough to learn. To me the fact that I can interrogate Claude about this stuff is almost as valuable as the speed. (And I think it helps Claude too.)

wiseowise23h ago

A friendly reminder to just do 9 to 5 and touch lots of grass. None of this shit represents industry trends, majority of people still use chat interfaces and copy blocks of code. There’s zero early adopter advantage here, only FOMO and lots of anxiety.

nullbio12h ago

It's marketing... You can't take anything Anthropic says at face value.

jwpapi1d ago

The issue is that whilst the loops will initially lead to good results they will be less and less as context gets bigger and bigger and tougher to understand for human and AI.

So it depends really on the size of your project.

sunir21h ago

Dear Abby,

I am torn. I have fallen in love with vibe coding but I still am in love with the software I’ve used for decades that works reliably.

Vibe coding gives me what I need and want right now. Its fast. Fun. Always makes me feel validated.

My older software never changes. It’s constantly telling me no. When it gets mad, it throws errors at me sometimes! But I can’t leave it. It runs my life and I know it will take care of me for years to come.

And the vibe code it’s so flaky… and expensive. It sucks up endless amount of my time, compute, and money and never gives anything back.

But it’s so fun. I tell all my friends about it and they’ve become so jealous they sought out their own vibe coder.

We’ve all found our vibe coders are a bit kinky. It’s become a social thing amongst my friends to talk about building cooler harnesses to control our vibe coders.

I don’t know what to do. My old software pays the bills but she keeps threatening to dump my ass on the curb and replace me with her own vibe coder.

I know she can’t really do it. She needs me too. And I need her.

Can we ever patch up our diffs?

— just some git with uncommitted changes

blurbleblurble10h ago

Really sharp perspective. Agentic engineering is software engineering with inertia. We're growing stuff now, not knitting it link by link.

jsw9723h ago

In my own ham-fisted experiments with coding loops, one pathology I have noticed is that the LOC just spirals out of control. That's likely because of the layers of defensive fixes, etc., that get built. That inevitably causes context bloat (or at least navigational friction) and results in quality decline.

I wonder how many loop-related issues could be addressed by simply fixing a LOC budget, or assigning a cost in some way. Unclear how you would dial in the right numbers, though.

hakanderyal1d ago

I think this is a common sentiment among heavy users of AI that also still cares about code quality.

I've built up a skill harness and review flow that makes Opus generate slop-free code 90% of the time. But the remaining 10% requires me to stay at the helm. Especially in the early stages.

I would love to use loops to automate more, but I couldn't do it with the current generation models.

And on the back of my mind I'm still evaluating the possible future where we are forced to API pricing. I'm currently paying $400 for Opus, and use around 1.5-2 billion tokens per day. This will cost around $20k/m with API pricing. And I don't want to even imagine the possible scenario of getting locked out of frontier models because of politics.

Will the models get better to cut me out of the loop completely? I believe so. Will the open source models catch up tho SOTA models, and diversify from China-only? I hope so. Otherwise 2 superpowers will wield a soft power that can cripple the tech industries of all other countries.

knivets23h ago

These new AI trends are very tiresome, very similar to 2021 crypto mania - both trigger a lot of FOMO. If we have loops that write code and we don't need to verify anything, why are the devs still here? What's point of even learning this new trick as a dev if you truly believe that this can be used without any intervention? If loops work then it follows that a loop of loop works too - why hire any people at all? Just run a bunch of loops and build a profitable business, but then what's your moat? Any person can now launch loops on top of loops.

lhysdl9h ago

Andrej Karpathy says, “You can outsource your thinking, but you cannot outsource your understanding.” The catch is that we need to think in order to build understanding.

duendefm1d ago

I honestly wonder if this kind of stuff really brings something to the table. Like I use opus for sometime and certainly I can put it to good use and optimize some parts of my day to day job (programmer). But it fails so hard in such simple tasks that it seems to me that putting it in loop can't just magically make everything better, unassisted. Does anyone actually uses agents and loops to create new software, new technology? Has anyone created with those systems, software they couldn't produce otherwise technologically wise? Or is it at best just an accelerator, cutting off on the building time?

dirtbag__dad13h ago

> the code it produces is slop, but that’s more the fault of the model than the harness not being a good judge on if a step in the workflow resulted in a net improvement or completion.

I don’t know. I’ve invested heavily in building internal tools that scaffold code and lint the filled in architecture/code design. That with a ratchet pattern, to allow for new rules that have errors across the existing code base, but to asymptotically fix them, is working pretty well.

Example - all modules have tightly scoped design primitives (I’m using hexagonal architecture for the backend, for example). And all code has BDD tests, which is what I spend much of my time reviewing, since cases written in human sentences is easier than looking at so many files of code.

There is a relentless upkeep to draft rules that respond to the workarounds the agents come up with to adhere to the design I want, but it’s slowly approaching perfect. What has helped here tremendously is I use hooks to llm as a judge the decisions the llms make, and then have them review/raise the questionable ones after a first pass is completed. In general, this is snuffing out the slop effectively.

All to say, someone asked me recently what model I prefer. In this approach, the model doesn’t really matter to me because the code is consistently what I want. I’ll choose a model because it has better mcp speed (codex), or a more thorough scope (Claude code).

Where this IS true is when we’re building a net new pattern. The agents are not great at it. BUT most code can fit into the few patterns I’ve created, and what can’t you lock down a new pattern to enforce over a couple iterations of it. Almost everything, at least in SaaS, follows a template.

trjordan23h ago

I think there's 2 important, but separate, ideas in this post:

- Models are not good at or getting better at creating strong invariants, which his fundamental to good software

- It is unclear how to keep tabs on what the agent is doing, so you, a human, can intervene.

These are related, obviously: one of the highest-leverage things you can do is force you agent to use a strong, minimal set of types or data invariants or other constraints. They get much better when your codebase broadly supports this!

I do suspect they're separable, though.

If you had the right levers and visibility, you should be able to get the model to produce code that doesn't feel like slop. But every time I've had a model try to keep me in the loop, it inundates me with irrelevant decisions and busywork. Its inability to see what's structurally important still shows up, just differently.

[If the models get better at defining and respecting invariants, maybe there's a new flavor of slop, that's less obvious today.]

mikgp22h ago

Was everyone collectively lying over the past fifty years of software development when they repeatedly said more != better?

For specific use cases, performance and security and all sorts of tuning it could be truly amazing. But maybe loops should be like a tool we make a choice to use when optimal.

I just wonder if in the future we’ll come to realize that we don’t have to throw the baby out with the bath water. That you can take a beat to understand your code and do change management, and choose the right tool for the job, and curate and say no and have agency.

An observation might be - no one writes code like Google “you’re not google” is something that gets thrown around in software shops all the time. Why is it we all think we’re going to be writing code like Anthropic?

draginol23h ago

This is really terrible advice right now for most people.

I've had to rip out a lot of pretty terrible code made by engineers who have tried this.

I don't disagree that eventually, "loops" when combined with unlimited tokens and amazing models in the hands of people who know how to set them up right will be amazing. But for the typical Claude Code user, it's disaster.

The problem is not that loops write bad code once. Humans do that too. The problem is that loops apply local pressure repeatedly: add a fallback, add a guard, special-case the failing input, quiet the exception, satisfy the test. Over time that selects for code that is more survivable in the short term but less intelligible in the long term.

NichoPaolucci11h ago

Man - what a ride this last year and a half has been. I feel for the juniors or newer developers, who really haven't had time to get into the seat. I don't see a great place to really "settle" into right now, as the field is unfolding rapidly. I wonder what things will look like in 10 years.

If I give an agent a sufficient spec, and it can one shot it, I imagine we won't need to loop, especially if we assume the tech is going to meaningfully improve in the coming years. In 5 years, "make no mistakes" and "add tests + review this code" will be baked into the agent or completely unnecessary, right?

Maybe I'm out of the loop.

sandrello1d ago

This is a very fatalistic take. While I understand where it's coming from, I try not to share the same mindset: engineers getting increasingly distant from how things are getting built is not something that will "undoubtedly happen, whether we like it or not".

Also:

> Now there is obviously a question if this desire to understand the code is one that I will still have a few years from now.

I do not think we should be having doubts like this. Either you consider understanding the code you ship and allowing your future self to be able to work on the system you're building to be a value, or you don't. I, for one, do, and I do not think using LLMs and coding agents will affect my point of view on that.

aabdi23h ago

The post suggests fear about a surge of increasing amounts of code by loops and loops of agents.

I don’t know if I like the current world without it though.

80% of different teams code the code is poorly tested. The code doesn’t handle data consistency or asynchronous code properly because the engineers don’t know better (and frankly don’t care enough).

Dependency handling is poorly managed leading to low quality operations with improper dashboards, alarms, and ops.

Badly managed processes leads to people doing monkey work signing off checklists rather than automation.

Frankly… why is keeping any of that good? It really pisses me off seeing people accept any of that low quality but that standard is the default and not the outlier.

galoisscobi1d ago

As much as I like Claude Code, Boris has done a lot of harm by encouraging software engineering practices that lead to slopware. We have two camps of people at work, the first camp are the agent goes brrr. They don't understand the code they write. They have loops running, agent orchestrators or agent hype du jour. The second camp is people who are inundated with PRs, are holding the line on quality, and just exhausted. We've also had some management pressures where they think people are wasting time looking at code. Perhaps because some podcast they might be listening to, somebody says coding is largely solved.

> I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.

This is going to be a net negative on software quality for people who take this up, in my opinion.

I call out Boris but I also don't think he's being malicious. He's at the center of an important technological revolution and it would be hard not to get excited. I just wished he advocated for a more balanced and a realistic perspective.

13 more replies

relaxing23h ago

It’s almost as though these models were trained on a vast corpus of largely mediocre code. They will never outperform the median Github user - it is all they know, it is all they can do.

mattchew1d ago

This is the best essay on agentic coding I've read. Clear thinking and writing, pragmatic about the future of agent-led coding.

If you usually skip straight to the comments, you might want to actually read this one.

jongjong8h ago

Reading the comments here, I'm struck by how many people seem to view code quality as an optional thing. It's almost as if software can function correctly without it. But it can't. Code quality is not a nice-to-have, it's a must-have. For any moderately-complex offering. There's a certain level of complexity that you can never reach without a well designed codebase. Cannot reach no matter how many humans or AI agents you throw at the problem.

The idea that it's a 'nice-to-have' is an illusion. It's like if you borrowed a lot of money to fund your startup and you still have some cash in the bank; at that point, a viable business model might seem like a 'nice to have'.

It's only when the lender comes knocking and you don't have enough money to pay them that a viable business model suddenly becomes a 'must-have'.

themgt21h ago

This article very much resonates with my current line of thinking. With winking apologies to Douglas Hofstadter, I've been mentally shorthanding agentic software development as building via "strange loops"

Anyway, it does seem like time to start experimenting. With a large dose of humility regarding what the optimal process and stack will look ultimately like. https://z3os.ai/

nurettin19h ago

I just tell it "you have until morning to work on this, be careful not to use too much ram and don't burn the cpu"

and then it goes off to do its thing and hopefully rngesus is with us.

simonreiff21h ago

AI infrastructure/tools developer and researcher here (hic-ai.com). I fully agree with Armin's concerns.

I wrote an article recently (https://hic-ai.com/blog/tool-response-engineering) in which I argued that AI tool-engineering is the new frontier beyond prompts, and it talks about the agent loop and engineering loops, but boy I have a completely different perspective than Boris's. Rather than contending that prompts are no longer relevant because we can simply have AI think for us by having loops "prompt Claude and figuring out what to do", like what Boris claimed, I believe we have to think much harder now.

Why problems require human judgment and can't just be offloaded to an AI agent is simply this: AI agents lack durable, long-lasting, unique identities forged by real-world memories and experience, and they therefore lack judgment. There is no well-defined notion of having AI agents communicate to each other because they can't even tell the difference between talking to their future self or talking to another agent! They certainly don't reliably weigh whether a proposed fix to a failing unit test will subtly introduce new fallback logic that was never requested or otherwise alter the functionality of the system under test in some manner, or whether now is the time to refactor or now is the time not to refactor or whether to abstract more or less or anything like that. Most importantly, from a practical perspective: AI agents lack legal person status, and they therefore cannot own property or money, sue or be sued, be hired or fired, or otherwise be held financially responsible for their errors or other harmful acts they commit. Clearly, AI agents cannot even arguably qualify for legal personhood status in the future, unless and until they first are capable of assuming a unique and durable long-term identity, which in turn requires solving auth, memory, communications, and many other technical issues that are neither resolved nor standardized today. These facts combine to ensure that AI agents cannot be held liable and financially responsible when things go wrong, meaning that humans alone bear the costs of AI errors until further notice, and thus, human judgment remains the vital commodity that AI cannot replace. So many problems arise from abdicating judgment to AI agent in 2026!

Now, if all the above items were in existence, built, well-settled, etc., maybe I'd have to rethink things. But unless and until AI agents have unique identities and attain legal person status with bank accounts that can be sued and garnished in case of error, I don't think AI agents can seriously be trusted. Good human judgment and approval of all important decisions will remain the most important resource for any successful enterprise, for the foreseeable future. I think it's a very serious mistake to assume that human judgment can be swapped out safely by AI and certainly advise against taking Boris literally. Anyway, great article.

jongjong11h ago

One thing that I'm certain of, is that there are always costs to writing more lines than what is necessary to solve a specific problem.

I've experienced this before AI, and I've experienced this, magnified, with AI.

For a well-designed project written by hand, you will work faster writing the code for new features by hand than the same project written with AI from scratch, using AI to write the new features.

But... If you have a well designed, human-written codebase, you will be faster if you generate the code for new features using AI than if you do it by hand... And you can maintain that speed for long periods of time if you use fine-grained prompts. What matters most is the quality of the codebase.

You can achieve the same degree of maintainability using AI from the beginning but you would have to make fine-grained prompts.

The gap is about making good engineering decisions. It just so happens that the value of good decisions compounds over time.

jongjong11h ago

I feel like what should have happened with AI is that teams should have started to put more effort into planning and pre-implementation discussions. Second thing, team leads should have felt more comfortable to reject large Pull Requests.

When I see a large 10k-lines PR, I feel a sense of panic. In a corporate setting, I also feel a kind of pressure to approve; the more work was done, the more pressure there is to approve the PR. This is why I think up-front pre-implementation discussion and alignment has become essential.

You really can't have people going rogue and weaponizing their AI-generated lines to gain control of a project through the duality of brittleness + complexity.

Brittleness + Complexity = Control

noodletheworld23h ago

Theres a deep insight in this post about the value of looping for throw away code to explore a problem space, rather than brute force a problem by just applying more tokens and hoping.

The more I play in this space, the more I’m drawn to the idea that some kind of back tracking constraint solver is a better solution than then the current naive while loop / brute force approach here.

The results I see are similar to what you get from a greedy brute force constraint solver; solves trivial problems, sometimes solves harder problems after a long time, takes too long to solve really hard problems; solutions are increasingly non optimal on average as complexity goes up.

We have so much existing knowledge about building good constraint solvers, if we could just figure out how to apply it here somehow.

1 more reply

baddash15h ago

title describes my life in a nutshell

firefax15h ago

I haven't been this confused by a headline since Keir Starmer declared himself a "gooner".

I think a big issue with a lot of AI enabled coding is that tokens are currently heavily subsidized, and that refusing to learn how to write psudocode and pound out bugs in shell scripts is a fundamental step a lot of programmers are skipping... a stance that I find ironic considering that when I was being told as a preteen to "read the fucking manual" by the 90s internet, I was led to believe if I'm not churning out zero days in C by senior year of high school I might as well abandon all hope of ever understanding anything about computers.

(Flash forward, and the immortal words of the rapper Jay-Z: "I ain't passed the bar, but know a little bit... enough you won't be illegally searching my shit.")

topce1d ago

my experimental looping build on top of pi and zx mostly pi deep seek and some skills ;-) https://github.com/topce/pizx

m0llusk1d ago

One of the biggest problems with LLMs has turned out to be the cost of actually running them and this strategy functions as a usage multiplier.

intended1d ago

I'm willing to be persuaded otherwise: Looping seems to (currently) be a side effect of token subsidies.

If token costs are nil, then you can afford to run verification and generation through the same models. If token costs are high, then you will go broke verifying code sprawl.

Currently costs are (mostly) absent from the conversation, even though costs are what decide the limits which shape experience.

Also: Firms can be held liable for the products they sell, so if code cannot be reviewed then that code is essentially a law suit waiting to happen. I believe this is what customers will be demanding in the future: someone to hold accountable when things go wrong.

ilaksh22h ago

Great article and good description of LLM code quality problems and problems that derive from that. And fair to not want a tidal wave of slop to displace your entire craft.

But this article is strangely lacking in foresight in terms of rapidly evolving model capabilities and output. One visual way to see this is to compare levels of SOTA video generation models. Look at outputs from Sora, to Veo, to Seedance 2.0, and now just released Seedance 2.5.

Or compare LLMs/VLMs as they have progressed: GPT-2, GPT-3, GPT-4, Opus, Fable/Mythos.

You can see the level of sloppiness or poor world understanding progress from comical nonsense to junior to senior with a few holes in their brain to an engineer you can actually almost trust to produce clean code if you mention the right guidelines in your instructions (such as avoiding overly local code).

As the model size/complexity increases, the intelligence increases, and so does code quality. We will also start specifically putting more high level code quality tasks into training datasets and training harnesses. I mean, Karpathy will probably see this article and make a huge dent in the issues without even larger models.

One thing people may not be aware of is that there is still a lot of room for hardware efficiency improvements and model size to grow. The compute-in-memory paradigm is just getting started in a way. Look at companies like Tensordyne and Mythic AI, but they are going to get blown out of the water by fully in-memory approaches.

For example look at the recent wurtzite ferroelectric nitrides breakthrough from the University of Michigan team (one of them tragically jumped from height after intense interrogation regarding national security concerns). The military is providing significant funding to move this towards development and scaling out of the lab.

That type or level of truly new paradigm system is going to boost efficiency by multiple orders of magnitude.

I know there are people who think Fable 5 was the end of the public LLM/VLM frontier moving, or that it is impossible to scale models further due to energy consumption. But there is zero chance that every high level VLM/LLM research team on the planet is going to stop publishing models or that the rapid progress in compute efficiency will stop.

Point being, within a year or two, the code coming out will be much cleaner. And within five or six years what you may see is that the leading models are 100+ trillion parameters and have sophisticated persistent context management etc. and they do not even produce application source code.

Instead, the database is in the context and is neurally rendered at 24 fps into whatever UI, schema and business logic you prompt it with in a broad way. The whole application is just precise thinking in an artificial brain ten times the complexity of an equivalent human brain.

And if you are disturbed by the current level of outsourcing for thinking to AI, it is just getting started. In a way it will be incredible, from another perspective horrific, but what I think we are seeing is the evolution of an ExoCortex. There will be an AI glasses stage where the integration is closer but still somewhat low bandwidth.

But sooner than later we are headed towards high bandwidth brain computer interfaces that make AI into an actual new cognitive layer.

So the waves of slop might make you feel sick, but that is nothing compared to the transhuman cyborgs powered by superhuman AI that are around the corner.

wartywhoa2317h ago

Another piece of transhumanist trash on the Internet.

nfcampos1d ago

My own thoughts on this, with examples https://github.com/nfcampos/loop-dev/blob/main/README.md

1 more reply

j / k navigate · click thread line to collapse

276 comments

155 comments · 74 top-level

weego19h ago· 17 in thread

What does any of that mean in practice? it's just rambling about abstract concepts that seem to be designed to hint at a bigger picture, when it's just getting AI to write code for you.

kixiQu16h ago

mikepurvis16h ago

I agree. I think the comparison made elsewhere in the thread to Yegge's recent AI delusions/projects is especially unfair.

I really appreciated the piece, and I'm glad we can still publish work in progress thoughts that don't have a clear thesis and call to action.

coldtea17h ago

Once you buy into the AI hype, you babble like that. Yegge is an even worse example.

ricardobeat15h ago

fantasizr18h ago

https://news.ycombinator.com/item?id=46682325

f311a16h ago

> What does any of that mean in practice?

They want you to spend more tokens

georgemcbay14h ago

> They want you to spend more tokens

Spending your tokens like some sort of primitive meat bag is passe and any developer doing that will be left behind by the true AI natives.

What you need to do this month is set up a central planning agent that creates a 5 year plan and then orchestrates teams of subagents to each direct subteams of subsubagents to fulfill your goals.

Have each team at every level inspect each other's work and authorize each agent up and down the chain to spend tokens on your behalf for the greater good.

With any luck, the US Administration will see the light and allow AI agents to open up their own credit cards on your behalf to eliminate unnecessary bottlenecks. Agentic Fintech is the future.

We need to move beyond this early stage where you give any thought to spending those tokens yourself!

bashtoni15h ago

Practically, what this boils down to is having clear success criteria.

The harness (Claude Codex, Codex, Pi etc) keeps throwing things into the context and executing tools (as directed by the model) until the success criteria is satisfied.

bogzz15h ago

You gotta just use more tokens, more tokens means better everything.

dofm18h ago

My own feeling is that it is totally OK to simply route around these people.

AndrewKemendo16h ago

I’m sorry but this response is just absolutely ridiculous and is not giving anything near the respect to the author that you should have.

You’re just rambling and ranting about philosophical things and have basically nothing to say about the technical or engineering points that the author wrote.

This is a entirely emotional appeal and doesn’t actually engage the author where the author is engaging in the audience.

If you look further down thread there’s dozens of comments that are engaging with the content and not being hyperbolic about all this cyber shamanism or whatever you wanna call it

wahnfrieden17h ago

Why is HN interested in management and team process discussion but allergic to similar topics on how to manage agents?

It’s like saying why discuss these team workflows when it’s just devs writing code. Or why use any jargon to describe workflows when it's just devs writing code.

AndrewKemendo15h ago

Because like every other large social group it’s actually a collection of dozens of subgroups that have an overlapping interest

SimianSci18h ago

When someone is expected to be wizened and does not have the knowledge to keep up with the needs of those around them, they in turn become Shamanistic in their practice.

The speed of improvement on these models has been incredible and has outpaced the learning speed of humans and put many experts into these Shamanistic roles.

watutalkinbout18h ago

Algorithms and data that emulate responses aren't smart.

A 5 year old knows if you want to wash your car, you need to take it to the car wash.

coldtea17h ago

Can a 5 year old write a substantial program on spec, that passes the requirements and given tests, in a few minutes?

If not, then perhaps this comparison is not the be all end all.

"A ship is useless, it can't drive over land..."

2 more replies

FarmerPotato13h ago

referring to John Woolridge's recent talk? "the car wash is only 200 feet away, should I drive or walk?"

His slide showed Opus 4.6 saying "walk". I couldn't get 4.6 to do that.

ramon1561d ago· 14 in thread

Quoting the creator of CC holds little value in my opinion. I too call my product good.

> opting out of this fully machine-driven future may not be an option.

I am contemplating whether I want to stay inside this rat race.

meowface1d ago

I and my friends go back and forth, every day, on whether coding with LLMs is a net plus or a net negative.

I'm at the point where I think it's dumb to not do it but also dumb to do it. I have no real answer.

zahlman16h ago

> but it somehow lets me bypass akrasia… I get more done in three months even if I get less done in a day.

pdimitar23h ago

To me programming with LLMs made me a better programmer. But yes, I don't just rubber-stamp PRs.

It also finally allowed me to be less of a code monkey and more of an architect and a backend lead than before. Which I was really missing.

_verandaguy1d ago

    > I am contemplating whether I want to stay inside this rat race.

I'm in the same boat. I'm hoping to go back to school in 2027 and be out of work that revolves around programming in 5 years.

I'm not enthusiastic about the field anymore, which sucks, because I used to love working in programming.

endemic23h ago

What are you going to do? Asking for a friend.

_verandaguy23h ago

I'm still trying to figure that out. Something diplomacy-adjacent would be interesting to me; maybe conflict studies, or international relations.

It's also work with real-world impact, which is nice, though obviously comes with its challenges.

crymeth0t23h ago

> I am contemplating whether I want to stay inside this rat race.

1 more reply

Devasta1d ago

> I feel uneasy, and I do not enjoy the work I deliver using LLMs.

I don't know if that'll ever change. I can't even pretend I was doing something prestigious and artisan like watchmaking because I wasn't a good programmer beforehand.

enoch_r1d ago

This piece changed how I work with LLMs and made me much more optimistic about how "fun" it can be to work with them: https://nolanlawson.com/2026/05/25/using-ai-to-write-better-...

warmwaffles23h ago

cavoirom1d ago

Be your customer, write the software just for you, AI is so effective that you could do something meaningful for you just in spare tine.

Here is the similar perspective: https://isene.org/2026/05/Audience-of-One-Numbers.html

I was misunderstood you if you intend to write code by hand, I still did, I use AI to learn by example, but I write the real code myself, AI can help me improve the code. I learned a lot.

xpct1d ago

I used to think I'll be into coding for the long haul, contributing to open source, and working on multi-year side projects.

fartcoin671d ago

foobar100001d ago

1 more reply

boscillator1d ago· 7 in thread

> the right fix is not "handle every malformed case." ... [LLMs] will still attempt to handle now impossible errors.

Why do they do that? Is it from pre-training or re-enforcement? If that latter, can the labs please fix this?

rzmmm23h ago

Likely just that they err on the unnecessary error handling than missing error handling. They likely penalize runtime errors harshly in the training

jerf23h ago

ambicapter19h ago

efromvt18h ago

million times this - getattr on every dataclass is a wild choice

skywhopper23h ago

It’s because it matches the patterns they are trained to follow. They don’t understand the code. They can’t reason about the actual logic flow. They can only work with patterns.

CuriouslyC23h ago

zahlman16h ago

I would expect a language model to do a better job of coping with that kind of uncertainty, inferring type from name and usage, etc.

yanis_t1d ago· 7 in thread

If the point of the software is benefit people, should I still care about how the code looks.

Right now, I still think that the answer is yes, but in 3 years? in 10 years?

wartywhoa2317h ago

> If the point of the software is benefit people, should I still care about how the code looks.

The answer is yes you should, as long as you want to keep software benefiting people.

steezeburger19h ago

cadamsdotcom23h ago

You will always be able to ask the agent to do refactors for you - and it can do mega ones that exhaust you to think about!

yanis_t22h ago

Two problems: I will not get my pleasure, and I will still not know how the code works.

datadrivenangel23h ago

Agentic refactoring is very questionable if you want to maintain quality, as it will rewrite all your code to be more average.

zahlman16h ago

goatlover17h ago

Or you can do it yourself. Use AI as a prototyping tool or for boring throw away tasks. There's no reason everyone has to succumb to vibe coding.

camillomiller1d ago· 6 in thread

Show me the billion dollar solopreneur startup, or the profit increase for companies and at that point I’ll start thinking that this tasteless high level wanking might make sense in some way

soulofmischief1d ago

We've just invented the car and you're upset it hasn't achieved Mach speed yet.

bee_rider1d ago

One car went Mach 1, ever, apparently. Anyway, I don’t think the analogy fits. Ford or whoever didn’t loudly and frequently predict Mach 1 cars, right?

We’ll see, I guess.

soulofmischief23h ago

Yes, one car did Mach 1. And the first production car, the Benz Velo, could only go 12mph. It's an apt analogy.

1 more reply

camillomiller1d ago

2 more replies

camillomiller1d ago

I was literally quoting Sam Altman

soulofmischief23h ago

The point is that you are applying future aspirations on the present-day relatively brand-new model space and getting upset that we aren't there yet.

1 more reply

mmillin1d ago· 4 in thread

datadrivenangel23h ago

zelphirkalt15h ago

Are LLMs too dumb to understand the type system? Or is the type system too bad, to represent non-nullable?

Even in Python with its meh-ish optional typing system something that can be None is different from something that cannot be None.

handoflixue22h ago

How impossible are we talking?

mmillin21h ago

wseqyrku19h ago· 3 in thread

For some reason pro-ai blog posts feel like paid ads, I might be wrong.

tfrancisl18h ago

Who cares what Cherny thinks? He is selling his product, and he will probably cash out soon enough while his credibility is as high as it is.

rolisz18h ago

Armin is very nuanced and balanced. He spells out clearly in the blog post that bad parts of AI

shikshake18h ago

To me there’s usually an undercurrent of manic zeal that makes them feel that way.

joenot4431d ago· 3 in thread

We've had great success with agents thus far at my job. A year into Clauding and all our dev metrics are up while our downtime has remained steady.

How might someone accomplish the same goals using loops instead?

Jcampuzano21d ago

I personally have not had good luck with loops due to similar issues as the post author - but if you were to port your flow to "looping" it would be something like:

- An automation that periodically checks for PRD's at a given location that have not yet been implemented.

- Repeat checking for new PRD's every T unit time. (10 minutes, 1 hour, etc)

This is how people say you should be looping - you never even cared or looked at the code, and also never prompted the agent yourself.

camillomiller1d ago

noodletheworld23h ago

Use appium or XCTest or swift testing; generate the tests first (failing) from the spec.

The loop is basically then a while loop:

While (tests fail) { trigger agent: spec, failures list }

for bugs, write failing tests.

Its basically TDD.

Loops do nothing useful beyond making the “spec -> code” step more “hands off” and let you be confident that the code you write does what is intended.

Obviously you see the issue: writing the loop harness is > effort than not having it…

No one is actually doing that effectively.

Loops are currently carefully hand crafted, which makes them tedious and of questionable value imo.

CraigJPerry1d ago· 2 in thread

> My current status is that I have not had much success with this way of working for code I deeply care about

If something is judgement heavy, "code i care deeply about", then i don't really agree with the direction of travel here. Don't try to delegate decisions you care deeply about.

otto-riz15h ago

> Don't try to delegate decisions you care deeply about.

YES. Or find a deterministic way to insert them :D

zahlman16h ago

> Don't try to delegate decisions you care deeply about.

> they didn't live the curse of having that kind of person around - if you don't have them PERFECTLY aligned, then they run off at break neck speed in the wrong direction.

Exactly. If you wouldn't outsource it to people you considered highly skilled, why would you outsource it to a machine?

gcanyon1d ago· 2 in thread

I'm a software developer from way back, using tools and languages that coding agents are far less familiar with.

So when I use an agent to write code, it's in languages I'm less familiar with, and often using libraries I know nothing about.

All to say, my part of the process often ends up being:

That's a loop I'd love to automate.

timmytokyo17h ago

Sounds like a great way to avoid learning anything new about the languages you don't already know.

handfuloflight18h ago

Have you tried SKILL.MD files encoding your nuanced domain knowledge?

livingsoft13h ago· 2 in thread

andai12h ago

Interesting, could you share a link? (Your submissions appear to have been nuked.)

Edit: woah!

https://www.youtube.com/watch?v=NGgve4L2zY4

livingsoft12h ago

abathologist23h ago· 2 in thread

Generally interesting reflections here, yet I see the same kind of myopia and fatalism that is rampant in our (fashion) industry:

> yet I have no doubts that this looping future is going to be our future despite the fact that I presently resent it

mohsen122h ago

I'm really curious to see how this unfolds. It's a defining moment for us I think

Terr_18h ago

"It is inevitable, soon all insulation will be asbestos, you have to learn the types in order to be competitive in its new future."

johnwheeler18h ago· 2 in thread

What is a loop in simple terms?

zahlman16h ago

wartywhoa2317h ago

The Loop.

The Singularity.

The Nobody who runs The Show.

The Acceleration toward inevitable Transcendence.

Revere, repent and obey! Compute is Destiny.

codeDruid1d ago· 2 in thread

Yeah I don't know. Don't get me wrong, the article points makes sense. But sometimes I think that we're going to stay near this current point of productivity for a little while.

erispoe1d ago

How do you control code quality?

baw-bag23h ago

mccoyb1d ago· 1 in thread

Many people think they can jump ahead with agents. You cannot fake understanding or clarity. It is painfully obviously when someone skipped that meat brain understanding phase.

athrowaway3z1d ago

I had codex write a tool to extract all my pi sessions. (Had to filter out my prompts from the agents talking to subagents).

Then I had it analyze the patterns i was making and turned that into the flowchart for the outer guidance-creating-prompt.

I didn't have to spend too much time thinking what i wanted. I wanted it to do that.

The result is still mixed, and i'm not trusting it with delicate code bases, but for a game i've been building i dropped my check-in time to 1/5th i was previously spending on it.

miki12321113h ago· 1 in thread

This ties into something I have been saying for months: LLMs are great at finishing tasks, but bad at aesthetics and taste.

jr359213h ago

> Note that maintainability and code quality aren't synonymous, code quality is just a means to an end, and that end is maintainability.

galaxyLogic17h ago· 1 in thread

I think what is going to happen is revival of "Methodology".

whazor16h ago

Yes, I believe in reinventing methodologies for the rise of agents.

Before even discussing with my team whether something is a good idea, I can have a full prototype in a new branch within one morning. But then use it as a proof of concept and delete the code.

Another example: agents can generate full test coverage. Including mocking external dependencies and user behavior. Also all in one morning.

CuriouslyC23h ago· 1 in thread

The problem of incomplete context is partly a human problem and partly a harness/interconnectivity problem.

zahlman16h ago

> The lack of taste can be mitigated to some degree by improved training, though taste is not a stationary distribution in humans (see trends/fads/etc), we can at least better track the cutting edge.

The point of "taste" is not to copy from others, even if you can somehow filter out the trends and fads. And really, that filtering comes from personal experience anyway.

artisin23h ago· 1 in thread

This.

handoflixue23h ago

I was really surprised by this part!

inline_always17h ago· 1 in thread

wahnfrieden17h ago

That's the entire topic - loops are not just infinite output, they require automated verification and progress evaluation steps.

rcarmo1d ago· 1 in thread

There's _way_ more than one way to do "loops". I just asked one of my superviors/auditors to document how it's been working while monitoring a few other agents that have long-term goals:

https://gist.github.com/rcarmo/4922b550ab48bf0b4246c77e606a5...

rcarmo23h ago

I'm glad HN prefers downvoting actual work/demos to discussing them.

JodieBenitez1d ago· 1 in thread

> For now I have not moved past the point of comprehension being important to me.

Ah ! This is me too... at least for what I have to ship at work. Not so much for my toy/weekend projects. But it turns out agents are also good at explaining.

jazzypants1d ago

I think it's insane to suggest that software developers should ever get to the point where they don't even comprehend their code.

1 more reply

gavinh11h ago

I experience this daily now. It find it discouraging and concerning.

5 more replies

stillpointlab15h ago

My experience is that I am bottle-necked on specs. The agent loop is less of a thing for me now.

But I am just on the cusp of wanting the outer loop. The gate is almost entirely on spec creation and PR review. In places where those gates don't matter, I want the agent to keep chugging away.

7 more replies

Multicomp22h ago

Code is part of a shared and built understanding of an information system.

(Typo scan by gemma4-12b but I didn't let it alter my message)

tmshapland17h ago

wolttam22h ago

I am 100% for fully agentic loops... for tasks other than engineering.

I'm not willing to outsource the understanding how things work part of myself. That part of myself is what got me into computing in the first place.

If this work becomes simply a matter of describing intent to a machine (probably through an Issue, like a user), and going to check on the result when you get the 'done' notification: I'm done.

It's possible to use the tools to do awesome things without letting go of full system understanding of the parts that you look after.

1 more reply

piker1d ago

We used a “loop” before it was called that to drive MS-DOC support into Tritium. Based on that experience, I take issue with this:

“There are already impressive examples of large automatic porting efforts, including the reported work around moving parts of Bun from Zig to Rust.” (Emphasis added.)

contagiousflow23h ago

> You Cannot Quite Opt Out

5 more replies

sixhobbits17h ago

dataviz100018h ago

I’m having awesome success working with recursive agents. I discussed my experience with them. [0]

> Claude's attention doesn't distinguish between "instructions I'm writing" and "instructions I'm following" -- they're both just tokens in context.

It takes a little human help in the first iterations but after a while it will start to iterate and improve unsupervised.

[0] https://github.com/adam-s/agent-tuning

furyofantares19h ago

I have had some success with /goal for long tasks that can be set up in a way that the agent can do good work for an extended period of time.

A lot of tasks aren't amenable to that, and the ones that are still need a lot of care to be set up correctly. The default vibe coded codebase won't be.

I've come to think of the activity of choosing the right technology, the right architecture, the right testing setup, the right context, and the right /goals to use as programming the agent.

1 more reply

lifeisstillgood17h ago

>>> For now I have not moved past the point of comprehension being important to me.

I see software as new form of literacy, even in the AI world, so yeah in my world view, comprehension will be something we always cling to.

theahura8h ago

not to be dismissive, but isn't this the same as the ralph loop thing everyone was talking about 6 months ago?

agumonkey20h ago

I can't help but be tired of the LLM trendy, where people bang at loops until they hope the model sculpts something. It feels so empty mentally to just have results without constructing it.

That said the idea of loop has always been there (iteration, V cycle etc) but I'd be glad to find people with more theory and less agents swinging blindly so to speak.

illuminator8314h ago

I think a lot of people here have either not read the article fully or are misapprehending it.

raidicy16h ago

Hilarious that Jeremy Howard and Rachel Thompson have been saying you need a human in the loop from the beginning.

And now finally we discovered that we need to put a human in the loop.

stuartaxelowen17h ago

Using layers like the loops described here to abdicate your work is you decoupling from the joint market/engineering value you originally provided.

otto-riz15h ago

> Ideally all of that was also well documented. Where that understanding was lacking, it was generally regarded as something to improve upon.

wiseowise23h ago

nullbio12h ago

It's marketing... You can't take anything Anthropic says at face value.

jwpapi1d ago

The issue is that whilst the loops will initially lead to good results they will be less and less as context gets bigger and bigger and tougher to understand for human and AI.

So it depends really on the size of your project.

sunir21h ago

Dear Abby,

I am torn. I have fallen in love with vibe coding but I still am in love with the software I’ve used for decades that works reliably.

Vibe coding gives me what I need and want right now. Its fast. Fun. Always makes me feel validated.

And the vibe code it’s so flaky… and expensive. It sucks up endless amount of my time, compute, and money and never gives anything back.

But it’s so fun. I tell all my friends about it and they’ve become so jealous they sought out their own vibe coder.

We’ve all found our vibe coders are a bit kinky. It’s become a social thing amongst my friends to talk about building cooler harnesses to control our vibe coders.

I don’t know what to do. My old software pays the bills but she keeps threatening to dump my ass on the curb and replace me with her own vibe coder.

I know she can’t really do it. She needs me too. And I need her.

Can we ever patch up our diffs?

— just some git with uncommitted changes

blurbleblurble10h ago

Really sharp perspective. Agentic engineering is software engineering with inertia. We're growing stuff now, not knitting it link by link.

jsw9723h ago

I wonder how many loop-related issues could be addressed by simply fixing a LOC budget, or assigning a cost in some way. Unclear how you would dial in the right numbers, though.

hakanderyal1d ago

I think this is a common sentiment among heavy users of AI that also still cares about code quality.

I've built up a skill harness and review flow that makes Opus generate slop-free code 90% of the time. But the remaining 10% requires me to stay at the helm. Especially in the early stages.

I would love to use loops to automate more, but I couldn't do it with the current generation models.

knivets23h ago

lhysdl9h ago

Andrej Karpathy says, “You can outsource your thinking, but you cannot outsource your understanding.” The catch is that we need to think in order to build understanding.

duendefm1d ago

dirtbag__dad13h ago

> the code it produces is slop, but that’s more the fault of the model than the harness not being a good judge on if a step in the workflow resulted in a net improvement or completion.

trjordan23h ago

I think there's 2 important, but separate, ideas in this post:

- Models are not good at or getting better at creating strong invariants, which his fundamental to good software

- It is unclear how to keep tabs on what the agent is doing, so you, a human, can intervene.

I do suspect they're separable, though.

[If the models get better at defining and respecting invariants, maybe there's a new flavor of slop, that's less obvious today.]

mikgp22h ago

Was everyone collectively lying over the past fifty years of software development when they repeatedly said more != better?

For specific use cases, performance and security and all sorts of tuning it could be truly amazing. But maybe loops should be like a tool we make a choice to use when optimal.

draginol23h ago

This is really terrible advice right now for most people.

I've had to rip out a lot of pretty terrible code made by engineers who have tried this.

NichoPaolucci11h ago

Maybe I'm out of the loop.

sandrello1d ago

Also:

> Now there is obviously a question if this desire to understand the code is one that I will still have a few years from now.

aabdi23h ago

The post suggests fear about a surge of increasing amounts of code by loops and loops of agents.

I don’t know if I like the current world without it though.

Dependency handling is poorly managed leading to low quality operations with improper dashboards, alarms, and ops.

Badly managed processes leads to people doing monkey work signing off checklists rather than automation.

Frankly… why is keeping any of that good? It really pisses me off seeing people accept any of that low quality but that standard is the default and not the outlier.

galoisscobi1d ago

> I don’t prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.

This is going to be a net negative on software quality for people who take this up, in my opinion.

13 more replies

relaxing23h ago

It’s almost as though these models were trained on a vast corpus of largely mediocre code. They will never outperform the median Github user - it is all they know, it is all they can do.

mattchew1d ago

This is the best essay on agentic coding I've read. Clear thinking and writing, pragmatic about the future of agent-led coding.

If you usually skip straight to the comments, you might want to actually read this one.

jongjong8h ago

It's only when the lender comes knocking and you don't have enough money to pay them that a viable business model suddenly becomes a 'must-have'.

themgt21h ago

Anyway, it does seem like time to start experimenting. With a large dose of humility regarding what the optimal process and stack will look ultimately like. https://z3os.ai/

nurettin19h ago

I just tell it "you have until morning to work on this, be careful not to use too much ram and don't burn the cpu"

and then it goes off to do its thing and hopefully rngesus is with us.

simonreiff21h ago

AI infrastructure/tools developer and researcher here (hic-ai.com). I fully agree with Armin's concerns.

jongjong11h ago

One thing that I'm certain of, is that there are always costs to writing more lines than what is necessary to solve a specific problem.

I've experienced this before AI, and I've experienced this, magnified, with AI.

For a well-designed project written by hand, you will work faster writing the code for new features by hand than the same project written with AI from scratch, using AI to write the new features.

You can achieve the same degree of maintainability using AI from the beginning but you would have to make fine-grained prompts.

The gap is about making good engineering decisions. It just so happens that the value of good decisions compounds over time.

jongjong11h ago

You really can't have people going rogue and weaponizing their AI-generated lines to gain control of a project through the duality of brittleness + complexity.

Brittleness + Complexity = Control

noodletheworld23h ago

Theres a deep insight in this post about the value of looping for throw away code to explore a problem space, rather than brute force a problem by just applying more tokens and hoping.

We have so much existing knowledge about building good constraint solvers, if we could just figure out how to apply it here somehow.

1 more reply

baddash15h ago

title describes my life in a nutshell

firefax15h ago

I haven't been this confused by a headline since Keir Starmer declared himself a "gooner".

(Flash forward, and the immortal words of the rapper Jay-Z: "I ain't passed the bar, but know a little bit... enough you won't be illegally searching my shit.")

topce1d ago

my experimental looping build on top of pi and zx mostly pi deep seek and some skills ;-) https://github.com/topce/pizx

m0llusk1d ago

One of the biggest problems with LLMs has turned out to be the cost of actually running them and this strategy functions as a usage multiplier.

intended1d ago

I'm willing to be persuaded otherwise: Looping seems to (currently) be a side effect of token subsidies.

If token costs are nil, then you can afford to run verification and generation through the same models. If token costs are high, then you will go broke verifying code sprawl.

Currently costs are (mostly) absent from the conversation, even though costs are what decide the limits which shape experience.

ilaksh22h ago

Great article and good description of LLM code quality problems and problems that derive from that. And fair to not want a tidal wave of slop to displace your entire craft.

Or compare LLMs/VLMs as they have progressed: GPT-2, GPT-3, GPT-4, Opus, Fable/Mythos.

That type or level of truly new paradigm system is going to boost efficiency by multiple orders of magnitude.

But sooner than later we are headed towards high bandwidth brain computer interfaces that make AI into an actual new cognitive layer.

So the waves of slop might make you feel sick, but that is nothing compared to the transhuman cyborgs powered by superhuman AI that are around the corner.

wartywhoa2317h ago

Another piece of transhumanist trash on the Internet.

nfcampos1d ago

My own thoughts on this, with examples https://github.com/nfcampos/loop-dev/blob/main/README.md

1 more reply

j / k navigate · click thread line to collapse