The problem is that the mitigations offered in the article also don't work for long. When designing a system or a component we have ideas that form invariants. Sometimes the invariant is big, like a certain grand architecture, and sometimes it’s small, like the selection of a data structure. You can tell the agent what the constraints are with something like "Views do NOT access other views' state" as the post does.
Except, eventually, you'll want to add a feature that clashes with that invariant. At that point there are usually three choices:
- Don’t add the feature. The invariant is a useful simplifying principle and it’s more important than the feature; it will pay dividends in other ways.
- Add the feature inelegantly or inefficiently on top of the invariant. Hey, not every feature has to be elegant or efficient.
- Go back and change the invariant. You’ve just learnt something new that you hadn’t considered and puts things in a new light, and it turns out there’s a better approach.
Often, only one of these is right. Often, at least one of these is very, very wrong, and with bad consequences.
Picking among them isn’t a matter of context. It’s a matter of judgment, and the models - not the harnesses - get this judgment wrong far too often. I would say no better than random chance.
Even if you have an architecture in mind, and even if the agent follows it, sooner or later it will need to be reconsidered. What I've seen is that if you define the architectural constraints, the agent writes complex, unmaintainable code that contorts itself to it when it needs to change. If you don't read what the agent does very carefully - more carefully than human-written code because the agent doesn't complain about contortious code - you will end up with the same "code that devours itself", only you won't know it until it's too late.
I had strong principles at the outset of the project and migrated a few consumers by hand, which gave me confidence that it would work. The overall migration is large and expensive enough that it has been deferred for nearly a decade. Bringing down the cost of that migration made me turn to AI to accelerate it.
I found that it was OK at the more mechanical and straightforward cases, which are 80% of the use cases, to be fair. The remaining 20% need changes to the framework. Most of them need very small changes, such as an extra field in an API, but one or two require a partial conceptual redesign.
To over simplify the problem, the backend for one system can generate certain data in 99% of cases. In a few critical cases, it logically cannot, and that data must be reported to it. Some important optimizations were made with the assumption that this would be impossible.
The AI tooling didn't (yet) detect this scenario and happily added migration logic assuming it would work properly.
Now, because of how this is being rolled out, this wasn't a production bug or anything (yet). However, asking the right questions to partner teams revealed it and unearthed that some others were going to need it as well.
Ultimately, it isn't a big problem to solve in a way that will mostly satisfy everyone, but it would have been a big problem without a human deeper in the weeds.
Over time, this may change. Validation tooling I built may make a future migration of this kind easier to vibe code even if AI functionality doesn't continue to improve. Smarter models with more context will eventually learn these problems in more and more cases.
The code it generates still oscilates between beautiful and broken (or both!) so for now my artistic sensibilities make me keep a close eye on it. I think of the depressed robot from the Hitchhiker's Guide to the Galaxy as the intelligence behind it. Maybe one day it'll be trustworthy
To be fair, there are many people like this as well. One of my personal favorite examples was way back in the 80s when I inherited the code for a protocol converter that let ASCII terminals communicate with IBM mainframes via the 3270 protocol.
One of the pieces of code in there, for managing indicator lights, was simply wrong. It was ca. 150 lines of Z80 assembly language that was trying to faithfully follow the copious IBM documentation of how things worked, but it had subtle issues and didn't always work.
My approach was to accept the documentation as accurate (the IBM documentation was always verbose and almost never wrong), but to reason that the original 3270 had these functions implemented in TTL logic gates, and there was no way in heck that they were wasting enough gates on indicator lights to require the logical equivalent of 150 instructions.
So in my mind, it had to be a really simple circuit that had emergent properties that required the reams of documentation. With that mindset, I was able to craft correct code for this in 12 instructions.
Many systems are likewise fractal in nature. You want to figure out the generating equations, rather than all the rules that derive from those. And, in many cases, writing down the generating equations is at least as easy to do in code as it would be to do in English for someone or something else to implement.
I find this to be a big problem with spec driven development: no spec survives the real world, some invariant that was in the spec will inevitably turn out to be wrong, no matter how much time you spend researching and designing the spec.
When I as a human hit this during development, I can take a step back and think it through, and decide oh yes, the invariant is wrong and needs to be thought through again, and the impact of changing it needs to be assessed. Then I can design around it. Sometimes that means a substantial change in design, sometimes not, but in all times the resulting software is better for it: an unknown has been uncovered, something new has been learned.
When this happens to AI, it keeps churning on it until it manages to hack a solution together, under the potentially wrong assumptions, design, or invariant. It doesn’t have the insight to step back and holistically reevaluate.
At least, that’s been my experience working with AI. I think we can improve its ability to handle these situations, through good workflows and verification, but it’s not something that comes natural to AI and not something Claude code or whatever support out of the box and it’s got its limits.
But in all seriousness it depends on what you’re doing with it. Writing a quick tool using an LLM is much easier than context changing to write it yourself. If you need the tool, that’s very valuable.
Even if you could state it in a precise formal language the LLM under the agent doesn’t have the capability to understand what the invariant is for and why it’s important. You’ll still get oddly generated code. You might get an LLM that can associate certain tokens with those in the formal language specification which can hold invariants and perhaps even write the proofs… but you’ll still get a whole bunch of other code generated from the informal parts of the prompt.
I agree that simply adding constraints and prompts to you skills and specs isn’t going to prevent these things. Worse, that even if you could invent a better mouse trap the creature will still escape.
The problem is… “elongation:” the addition of code for the sake of the prompt/task/etc. Often less is better. This takes a human with the ability to anticipate what other humans would want/expect. When you need a generator, they’re great but it’s a firehouse that whose use should be restrained a little more.
Ancillary parts I don't mind generating, but for core features I still need to be actively writing most of the time.
If you already have a mature code base, then it's very easy to get AI to write excellent code. It has a ton of documentation on what you already do, how you do things, functions to use etc.
I read all the changes AI does. I work in small chunks.
>Even if you have an architecture in mind, and even if the agent follows it, sooner or later it will need to be reconsidered
The agent can modify the structure you want to change to 100x faster than you can. That's the beauty of it. We all know how hard it is manually to make architectural changes once you've started to lock into something.
These comments just show me you must not be using AI in the right way, or haven't used it enough to learn "how" to use it. I've been using claude code months now at full speed. You are simply wrong that it doesn't generate good code.
Indeed for the task of “jump into an unfamiliar codebase and make a requested change that aligns with existing styles and patterns, and uses existing functionality” I would say something like opus 4.7 exceeds the capabilities of most developers.
Yeah I’m currently working for several months already on a harness that wraps Claude Code and Codex etc to ensure that these types of invariants are captured and enforced (after the first few harness attempts failed), and - while it’s possible - slows down the workflow significantly and burns a lot more tokens. In addition to requiring more human involvement, of course.
I suspect this is the right direction, though, as the alternatives inevitably lead any software project to delve into a spaghetti mess maintenance nightmare.
that's when I stopped.
I review every line of code I generate with AI. I mainly use an MR-based approach:
1) Provide a tightly scoped technical spec to Codex as a task, and ask for 3x solutions. Usually at least one of them is on the right track, and it is better to ditch a solution that went in the wrong direction than to try to fix it.
2) Review the explanation and diff of the proposed changes line by line, file by file. If I find minor deviations from what I asked, or violations of the codebase architecture/conventions, I write comments in the diff and/or global comments, and ask again for 3x adjusted solutions.
3) Usually, by this point, the solution is ready for me to merge locally and either run local tests or do some manual fine-tuning.
4) Finally, I generate unit tests. I leave them to this stage because I can repeat the same process with the sole intent of generating case-specific unit tests. This way, I can generate/review tests against the final version of the implementation.
This has been working very well for me since our repos are reasonably organized and have a well-defined architecture. In the technical spec, I include the major architectural requirements and code conventions, and I also add a catch-all like "follow the codebase's existing conventions and style", which works reasonably well.
This simple process has enabled me to deliver most minor/medium tasks and bug fixes really quickly while maintaining control over the changes and without lowering the quality bar. For larger and more challenging tasks, I find myself "driving the wheel" (i.e. coding by hand) more often, and using AI code generation in a much more scoped and specific way. So that becomes a different process altogether.
I'm sure you agree broadly with Gabe Newell, "people who don't know how to program who use AI to scaffold their programming abilities will become more effective developers of value than people who've been programming, y'know, for a decade." Look, he's talking about you and me. Programming for a while is quickly becoming worthless. It is of course the journey of programming that gives some people insight to real problems - business, creative, whatever - so it is extra important that the people with the best programming skills use the chatbots to write a lot of code that you and I will absolutely never read.
And anyway, you, as consumer, are constantly using code you have never read. Lots of code is shipped that we never read. There is nothing special about reading code. Even if you and I learned everything by reading code, it doesn't mean that generated code isn't going to create value. It's going to generate tons and tons of value.
Yet another POV is, if you are making code for customers who need to read the code, you are making a mistake, in the long term. It is a very, very interesting way to think about efforts around SBOM and various security companies - a far more informative lens to look at Wiz or Cloudflare, and what value they actually provide, because it's not code - and how relatively little enterprise value the "we read everything" teams at high frequency trading startups really deliver. You know this, you know exactly what I am talking about, it's your experience, so it is surprising to hear from you, talking in generalities against a trend that is obviously coming for all the best programmers.
Well, that is problematic. I have to either assume you are disinterested or lying and neither is great for any discourse.
Then when it was completing functions, people would say, "yeah, but you still have to make sure you're the one writing the logic around the functions"
Then when it was completing the logic around the functions, people would say, "yeah, but you still have to make sure you're the one writing the features"
Now it's completing features and people say, "yeah, but you still have to make sure you're the one writing the architecture"
I don't know if architecture is a solvable problem for these models, but it is interesting watching the expectations moving over time.
When AI can complete lines, you still have to read and understand the code.
When AI can complete whole functions, you still have to read and understand the code.
When AI can complete features and tickets, you still have to read and understand the code.
I think the solution is between the lines of this article. The author states the steps leading to this, but doesn't arrive at it explicitly. It has been obvious (With 50/50 hindsight) to me since LLMs started getting popular, and holds:
LLMs are fantastic for software dev. If you don't let it write architecture. Create the modules, structs, and enums yourself. Add as many of the struct fields and enum variants as possible. Add doc comments to each struct, enum, field, and module. Point the LLM to the modules and data structures, and have it complete the function bodies etc as required.
Have people's standards for quality just completely vanished in the pursuit of the shiny new thing? Is that guy doing something wrong?
That has also been my experience with this sort of thing fwiw, which is why I gave up and do more of a class-by-class pairing with an LLM as a workable middle ground.
At least with current languages, I think the primary problem is they are globally complex, and it's not scalable for them (and certainly for you to review a codebase they've mainly or completely generated) that the invariants you want are being withheld.
No matter how many times you tell them - there is ZERO blocking allowed on the critical path, they will add blocking on the critical path.
No matter how many times you tell them any time they do X, they need Y type of test, they will do X without Y type of test.
They cannot follow directions 100%. Neither can people.
But they are more random. The mistakes people make are less likely to do the exact polar opposite of what you wanted to do.
People are less likely to see a critical invariant in the code, build themselves a loophole to get through it, write a test that the code fails successfully, and then tell you they did exactly what you asked for, and burry it in a 5k line commit, where 1000 lines are them changing comments that shouldn't be there in the first place.
LLMs are great. I'm convinced they're the future. I'm building a language specifically for them: https://GitHub.com/Cuzzo/clear - and to make it easier for YOU to work with them.
I think once we get around this language problem, that they need global context for things where they shouldn't, it will be a challenge to work with them.
I've had success with them, but it's been so frustrating, that I question how much it's been worth my sanity.
So it's not much of a surprise that this is the situation folks find themselves in with the current models.
They can keep internal consistency so the more you let it write the more it can write with internal consistency. It still fails at all of these levels as soon as you are looking at each level of detail.
"it takes too much effort to get the output production ready"
turning into
"maybe long term the maintenance will be more expensive"
I give it three months until people realize that you rarely need to review every single line and fully understand the code, like so many comments are claiming.
While the salary stays stagnant or even reduced if you adjust for inflation.
This blob of people criticizing AI is just that, a blob. A gaggle of discrete people that your brain makes up a narrative about being some goalpost shifting entity.
Of course there could be individuals who have moved the goalposts. Which would need a pointed critique to address, not an offhand “people are saying” remark.
It's completing shit. Even if it does not implement some lazy stuff with empty catch blocks (i.e. happy path from programming 101 tutorials), it will either expose your secrets in a sensible place or do some other stupidity.
Also, you've set up a huge strawman here. Who are these people saying these things in this order and why is that the argument and not "You need to be reviewing every line of code that gets written and understand it."
Your argument is nonsense.
1. If I use a coding agent to generate code, it should be something I am absolutely confident I can code correctly myself given the time (gun to my head test).
2. If it isn't, I can't move on until I completely understand what it is that has been generated, such that I would be able to recreate it myself.
3. I can create debt (I believe this is being called Cognitive Debt) by breaking rule 2, but it must be paid in full for me to declare a project complete.
Accumulating debt increases the chances that code I generate afterwards is of lower quality, and it also feels like the debt is compounding.
I'm also not really sure how these rules scale to serious projects. So far I've only been applying these to my personal projects. It's been a real joy to use agents this way though. I've been learning a lot, and I end up with a codebase that I understand to a comfortable level.
This all works pretty great. Where it starts going off the rails is if I let it use a library I'm not >=90% comfortable with. That's a good use of these tools, but if I let it plow through feature requests, I end up accumulating debt, as you pointed out.
For my uses, I'm still finding the right balance. I'm not terribly sure it makes me faster. What I do think it helps with is longer focused sections because my cognitive load is being reduced. So I can get more done but not necessarily faster in the traditional sense. It's more that I can keep up momentum easier, which does deliver more over time.
I'm interested in multi agent systems, but I'm still not sure of the right orchestration pattern. These AI tools still can go off the rails real quick.
But we don’t follow the same things for dependencies, work of colleagues, external services, all the layers down to the silicon when trying to work.
Why is AI suddenly different?
We just have to do this by risk and reward. What’s the downside if it’s wrong, and how likely is an error to be found in testing and review? What is the benefit gained if it’s all fine? This is the same for libraries and external services.
A complex financial set of rules in a non-updatable crypto contract with no testing?
A viewer for your internal log data to visualise something?
Had a project idea which I coded with the help of AI and it became quite large to a point I was starting to have uncharted areas in the code. Mostly because I reviewed it too shallow or moved fast.
It was a good thing as that project never floated but if I were to do such a thing on my breadwinning project I would lose the joy.
I am not very disciplined, and find it too convenient to reach for an agent these days.
This may sound ridiculous, but I am addicted to nicotine. I used to have some sort of rule around how I am allowed to use nicotine pouches to manage my addiction. For example after I finish writing a feature, I could have one pouch. It was obviously a dumb idea that didn't last very long.. But in that specific aspect, coding agents feel similar. I tried setting up rules on how I should use them, but it's not easy to follow them.
Maybe the biggest problem is just guilt?
The swindle goes like this, AI on a good codebase can build a lot of features, you think it’s faster it even seems safer and more accurate on times, especially in domains you don’t know everything about.
This goes in for a while whilst the codebase gets bigger and exploration takes longer and failure rate increases. You don’t want it to be true and try harder so you only stop after it practically became impossible to make any changes.
You look at the code again and there is so much code spaghetti is an understatement it’s the Chinese wall.
You start working…, and you realize what was going on
I deleted 75,000 of 140,000 lines of code and I honestly feel like the 3 months I went hard into agentic coding I wasted and I failed my users by building useless features increasing bugs, losing the mental model of my code and not finding the problems I didn’t know about the kind of hard decisions you only see when you in the code, the stuff that wanders in your mind for days
They seem to be different for LLMs, because would anyone be surprised if they handed summary feature descriptions to some random "developer" you've ever only met online, and got back an absolute dung pile of half-broken implementation?
For some reason, people seem to expect miracles from some machine that they would not expect of other humans, especially not ones with a proven penchant for rambling hallucinations every once in a while.
I'd like to know, ideally from people who've been there, why they think that is. Where does the trust come from?
I don't understand this. A large codebase should be a collection of small codebases, just like a large city is a collection of small cities. There is a map and you zoom into your local area and work within that scope. You don't need to know every detail of NYC to get a cup of coffee.
Its your responsibility to build a sane architecture that is maintainable. AI doesn't prevent you from doing that, and in fact it can help you do so if you hold the tool correctly.
Eg., treating AI code generated as immediately legacy, with tight encapsulation boundaries, well-defined interfaces etc. And integrating in a more manual workflow.
There's a range from single-shot prompts to inline code generation, that will make more sense depending on the problem and where in the code base it is.
Single-shot stuff is going to make more sense for a protyping phase with extensive spec iteration. Once that prototype is in place, you then prob want to drop down into per-module/per-file generation, and be more systematic -- always maintaining a reasonably good mental model at this layer.
There comes a realization, to many engineer’s horror, that AI won’t be able to save them and they will have to manually comprehend and possibly write a ton of code by hand to fix major issues, all while upper management is breathing down their back furious as to why the product has become a piece of shit and customers are leaving to competitors.
The engineers who sink further into denial thrash around with AI, hoping they are a few prompts or orchestrations away from everything be fixed again.
But the solution doesn’t come. They realize there is nothing they can do. It’s over.
AI doesn't necessarily have to increase your throughput, it can also serve as a flexible exploration and refactoring tool that will support either later hand crafter code or agentic implementation.
I still have a lot of usage for AI: Exploration, Double-checking me, teaching me. But writing code became very tough for me to accept. Nex-edit autocompletes mainly
Coincidentally I've been working on a project for about 7 months now: its a 3d MMO. Currently its playable, and people are having fun with it - it has decent (but needs work) graphics, and you can cram a few hundred people into the server easily currently. The architecture is pretty nice, and its easy to extend and add features onto. Overall, I'm very happy with the progress, and its on track to launch after probably a years worth of development
In 7 months vibe coding, OP failed to produce a basic TUI. Maybe the feature velocity feels high, but this seems unbelievably slow for building a basic piece of UI like this - this is the kind of thing you could knock out in a few weeks by hand. There are tonnes of TUI libraries that are high quality at this point, and all you need to do is populate some tables with whatever data you're looking for. Its surprising that its taking so long
There seems to be a strong bias where using AI feels like you're making a lot of progress very quickly, but compared to manual coding it often seems to be significantly slower in practice. This seems to be backed up by the available productivity data, where AI users feel faster but produce less
This metric highly depends on who uses the AI to do what, where strong emphasis is on "who" and "what".
In my line of work (software developer) the biggest time sinks are meetings where people need to align proposed solutions with the expectations of stakeholders. From that aspect AI won't help much, or at all, so measuring the difference of man hours spent from solution proposal to when it ends up in the test loops with and without AI would yield... very disappointing results.
But for troubleshooting and fixing bugs, or actually implementing solutions once they have been approved? For me, I'm at least 10x'ing myself compared to before I was using AI. Not only in pure time, but also in my ability to reason around observed behaviors and investigating what those observations mean when troubleshooting.
But I also work with people who simply cannot make the AI produce valuable (correct) results. I think if you know exactly what you want and how you want it, AI is a great help. You just tell it to do what you would have done anyway, and it does it quicker than you could. But if you don't know exactly what you want, AI will be outright harmful to your progress.
Another thing I don’t see mentioned is code quality.
Vibe-coded code bases are an excellent example of why LLMs aren’t very good at writing code. It will often correct its own mistakes only to make them again immediately after and Inconsistent pattern use.
Recently Claude has been making some “interesting” code style choices, not inline with the code base it’s currently supposed to be working on.
It's got a fun Zelda-inspired mechanic (I won't say which one), and you'll have to unlock abilities and parts of the world over several quests and modes to "win".
It's also multiplayer.
AI, and especially agentic AI can make you lose situational awareness over a codebase and when you're doing deep work that SUUUUCKS, but it's not useless, you just have to play to it's strengths. Though my favorite hill to die on is telling people not to underestimate it's value as autocomplete. Turns out 40 gigabytes of autocomplete makes for a fucking amazing autocomplete. Try it with llama.vim + qwen coder 30b, it feels like the editor is reading your mind sometimes and the latency is so low.
That’s the hard part of coding. If you have an architecture then writing the code is dead simple. If you aren’t writing the code you aren’t going to notice when you architected an API that allows nulls but then your database doesn’t. Or that it does allow that but you realize some other small issue you never accounted for.
I do not know how you can write this article and not realize the problem is the AI. Not that you let it architect, but that you weren’t paying attention to every single thing it does. It’s a glorified code generator. You need to be checking every thing it does.
The hard part of software engineering was never writing code. Junior devs know how to write code. The hard part is everything else.
The developers that thing coding is hard are the ones that absolutely love AI coding. It's changed their world because things they used to find hard are now easy.
Those that think coding is easy don't have such an easy time because coding to them is all about the abstractions, the maintainability and extensibility. They want to lay sensible foundations to allow the software to scale. This is the hard part. When you discover the right abstractions everything becomes relatively easy. But getting there is the hard part. These people find AI coding a useful tool but not the crazy amazing magical tool the people who struggle with coding do.
The OP is definitely in the second camp since they could spot and realise the shortcomings of the AI. They spotted the problem, and that problem is that the AI can't do the hard bit.
The first group are still thinking fairly deeply about design and interfaces and data structures, and are doing fairly heavy review in those areas. The second group are not, and those are the ones that I find a bit more worrisome.
You need to be checking every thing it does.
This is what seems to be lost on so many. As someone with relatively little code experience, I find myself learning more than ever by checking the results and what went right/wrong.This is also why I don't see it getting better anytime soon. So many people ask me "how do you get your claude to have such good output?" and the answer is always "I paid attention and spotted problems and asked claude to fix them." And it's literally that simple but I can see their eyes already glazing over.
Just as google made finding information easier, it didn't fix the human element of deciphering quality information from poor information.
I follow the plan -> red/green/refactor approach and it is surprisingly good, and the plans it produces all look super well reasoned and grounded, because the agent will slurp all the docs and forums with discussions and the like.
Trouble is once it starts working there would inevitably be a point where the docs and the implementation actually differ - either some combination of tools that have not been used in that way, some outdated docs, or just plain old bugs.
But if the goals of the project/feature are stated clearly enough it is quite capable of iterating itself out of an architectural dead end, that is if it can run and test itself locally.
It goes as deep as inspecting the code of dependencies and libraries and suggesting upstream fixes etc. all things that I would personally do in a deep debugging session.
And I’m supper happy with that approach as I’m more directing and supervising rather than doing the drudgery of it.
Trouble is a lot of my team mates _dont_ actually go this deep when addressing architectural problems, their usual mode of operandi is “escalate to the architect”.
This will not end up good for them in the long run I feel, but not sure what they can do themselves - the window of being able to run and understand everything seems to be rapidly closing.
Maybe that’s not super bad - I don’t exactly what the compiler is doing to translate things to machine code, and I definitely don’t get how the assembly itself is executed to produce the results I want at scale - that is level of magic and wizardry I can only admire (look ahead branching strategies and caching on modern cpus is super impressive - like how is all of this even producing correct responses reliable at such a a scale …)
Anyway - maybe all of this is ok - we will build new tools and frameworks to deal with all of this, human ingenuity and desire for improvement, measured in likes, references or money will still be there.
You can skip that and go directly to writing code. But that meant you replaced a few hours of planning with a few weeks of coding.
And I'm sure the rewrite is going to teach me a whole different set of lessons...
For example, consider a lint rule that bans Kysely queries on certain tables from existing outside of a specific folder. You'd write a rule like this in an effort to pull reads and writes on a certain domain into one place, hoping you can just hand the lint violations to your AI agent and it would split your queries into service calls as needed.
And at first, it will appear to have Just Worked™. You are feeling the AGI. Right up until you start to review the output carefully. Because there are now little discrepancies in the new queries written (like not distinguishing between calls to the primary vs. the replica, missing the point of a certain LIMIT or ORDER BY clause, failing to appropriately rewrite a condition or SELECT, etc.) You run a few more reviewer agent passes over it, but realize your efforts are entirely in vain... because even if the reviewer agent fixes 10 or 20 or 30 of the issues, you can still never fully trust the output.
As someone with experience in doing this kind of thing before AI, I went back to doing it the old way: using a codemod to rewrite the code automatically using a series of rules. AI can write the codemod, AI can help me evaluate the results, but actually having it apply all of the few hundred changes automatically led to a lack of my ability to trust the output. And I suspect that will continue to be true for some time.
This industry needs a "verification layer" that, as far as I know, it does not have yet. Some part of me hopes that someone will reply to this comment with a counterexample, because I could sorely use one.
> back to writing code by hand
But what they are doing is
> doing the __design work__ myself, by hand, before any code gets written.
So... Claude still is generating the code I guess?
And seriously, I can't understand that they thought their vibe coded project works fine and even bought a domain for the project without ever looking at source code it generated, FOR 7 MONTHS??
And the goal of the article is to draw attention to their project.
I don’t think it’s that weird to not look at the code if it’s a side project and you follow along incrementally via diffs. It’s definitely a different way of working but it’s not that crazy.
This is a special case of a general fundamental point I'm struggling with.
Let's assume AI has reduced the marginal cost of code to zero. So our supply of code is now infinite.
Meanwhile, other critical factors continue to be finite: time in a day, attention, interest, goodwill, paying customers, money, energy.
So how do you choose what to build?
Like a genie, the tools give us the power to ask for whatever we want. And like a genie, it turns out we often don't really know what we want.
Now it is different in a way where now I don’t have time to use those apps.
That’s a joke.
But I do believe it answers the question of “what to build?”. If you didn’t have time before LLM assisted coding you still don’t have time for it. You most likely know what is used and what not already by heart or by some measurements.
Isn't Golang relatively easier to read than Rust? I was under the impression that Rust is a more complex language syntactically.
> The other change is simpler: I'm doing the design work myself, by hand, before any code gets written. Not a vague doc. Concrete interfaces, message types, ownership rules. The architecture decisions that the AI kept making wrong are now made in writing before the first prompt.
This post is good to grasp the difference between "vibe-coding" and using the AI to help with design and architectural choices done by a competent programmer (I am not saying you are not one). Lately I feel that Opus 4.7 involves the user a lot more, even when given a prompt to one-shot a particular piece of software.
+1 on Open 4.7 involving the user a lot more. Rn I'm trying to get to a state where I can codify my design + decision preferences as agents personas and push myself out of the dev loop.
It sounds like the author knows Rust, and might not be as familiar with Go.
A language that you are proficient in is always going to be easier read than one you don’t, even if it is an objectively easier language to to read in general.
We’ve moved to seeing that specs are useful and that having someone write lots of wrong code doesn’t make the project move faster (lots of times devs get annoyed at meetings and discussions because it hinders the code writing, but often those are there to stop everyone writing more of the wrong thing)
We’ve seen people find out that task management is useful.
Now more I’m seeing talk of fully doing the design work upfront. And we head towards waterfall style dev.
Then we’ll see someone start naming the process of prototyping, then I’m sure something about incremental features where you have to ma age old vs new requirements. Then talk of how really the customer needs to be involved more.
Genuinely, look at what projects and product managers do. They have been guiding projects where the product is code yet they are not expected to read the code and are required to use only natural language to achieve this.
When asking for a new major feature, despite hard guidelines and context (that eat half your context window), then it quickly ships bloat. The foundations are not very well organized and this is where you acknowledge it is all about random-prediction of the next word-thing.
Overall, i've wasted more time reviewing the PR and trying to steer it properly than I expected. So multi-layer agent vibe coding is no longer the way to go *for me*. Maybe with unlimited tokens and a better prompt, to be investigated...
The rewrite is me sitting down with a blank doc and drawing the boxes before any code exists. Then the CLAUDE.md enforces what I already decided. Whether that actually holds up as the project grows, I genuinely don't know yet.
Even i think that after few iterations of producing the code there must/should be change in the strategy.
I sometimes also wonder if i should add the software engineering text books that ` tried teaching us to code` but contained the frameworks that are better applied along with the principles like SOLID, DRY etc.
But then again, I do not have the right answer now. Maybe the reformation must come in the models too but as I see it, going back to hand coding is not the solution.
Just like we came up with different paradigms of coding, the different principles of coding, different frameworks in short, we need to and will come up with some frameworks (& maybe some newer models as mentioned above) that can and will make us call AI coding “The Standard”.
What are off the table (I think)
1. Hand coding out maybe even reading AI’s code line by line. That’d rather be counterproductive. At least with me it takes more time to read its code and understand. But i evaluate its code not just be writing tests but by other means too depending on the situation and that’s for another time too. 2. Vibe coding 3. Thinking software engineering is automated (it definitely is more essential than ever) 4. So does software development - even that’s not going to go extinct 5. Software jobs are going to go extinct. (In fact if a company is losing people claiming it doesn’t need so many of em means to me that either they do not see much of future for themselves or they’re just playing the stock price and investor satisfaction game for the short run - but that’s for a different topic)
I add now a long list of instructions how to work with the type system and some do’s and don’ts. I don’t see myself as a vibe coder. I actually read the damn code and instruct the ai to get to my level of taste.
AI coding hurts your ego. People keep forgetting this is just a tool that accelerate what you want to do. If you leave decisions to the AI you'll be probably disappointed or surprised.
Agreed. Both technologies create unhealthy dependencies that many people would prefer to cut out.
>AI coding hurts your ego. People keep forgetting this is just a tool that accelerate what you want to do.
Are drummers propping up their egos for not using drum machines? There's a class of developer who genuinely enjoys the act of programming.
Eventually like every hype wave the dust will settle, and lets see where we stand.
By now all the AI companies have consumed all human knowledge so they either learn to actually think for themselves, or that is it.
Either way, that won't change the ongoing layoffs while trying to pursue the AI dream from management point of view.
I think most companies doing layoffs are bloated to begin with, AI is just the scapegoat to do the layoffs.
Yes I agree for sure llms write terrible code when left to their own devices, but so do most engineers. Which is why we have so many tools to help keep a certain level of quality. Duplication checks, tests, linters, other engineers.
I find whenever you make an llm repo without these checks, and more, it will write like an enthusiastic junior engineer, wrong and strong. However a junior engineer would be hard pressed to get 95% coverage on a codebase, the ai is more than willing and does it in a few minutes. We can use things like this to our advantage, how many people have ever seen a repo with 100% test coverage? With ai this is very possible, with people not so much.
LLM’s writes terrible code, we know this, but when dealing with humans that write terrible code we have many techniques. We should be using those same techniques to keep the llms honest, but more importantly verifiable.
Then you're right back on track.
In a way it's not that different from a human-made project. Plenty of teams have to crunch, ignoring the architecture and incurring tech debt, and then come back and fix it later.
I have to periodically get it to do a bunch of refactoring
I'm building an orchestrator (who isn't). Haven't looked at the code yet, but it appears to work. But man have I spent hours in loops between Claude, Codex and myself all on the highest thinking levels to figure out what interface portability means for the employee, how best to handle "remote" sessions and the appropriate semantics for pipelines/recipes.
I've also been very opinionated about who does what. I'll let the agent write a script to sync with github and reload workers, but I decided to "waste" the 5 minutes to manually do all of the config steps on render for my server when claude told me that I couldn't just give it read only scope to pull the logs. Bad news, I'm cutting and pasting for my computer overlord. Good news? Claude can't blow away the prod db if it happens to get in the way of whatever interpretation is makes of the instructions I give it.
A chainsaw requires very different skills that an axe. It has different failure modes. Some experience as a lumberjack probably helps using either/both.
No difference (at least now) with agents.
For example, I had Claude generate a language server for TLA+ so I could have nice keystrokes in Neovim. For things like this, I really do think there is such thing as "good enough"; a language server doesn't have to be perfect, and the stakes are pretty low, where I think the worst case scenario is that it screws up my code, but that should be relatively easy to catch in Git.
I have been trying to mostly have Claude generate code from specifications; either a Mermaid diagram for simpler stuff, and TLA+ for more complicated stuff. I usually supply a lot of surrounding context about how I want these specs to be implemented, and it will usually get me about 90% of the way there, but I've found that I still need to hack against it to get over the hump.
It makes me feel a little valuable; I finally have an excuse to use formal methods for things.
Hey I don't want to over simplify, I'm sure it was complicated, but did the author have functional tests for these broken views? As long as there are functional tests passing on the previous commit I'd have thought that claude could look at the end situation and work out how to get the desired feature without breaking the other stuff.
TUIs aren't an exception, it's still essential to have a way to end-to-end test each view.
You can't test every permutation of app usage. You actually need good architechture so you can trust your test and changes to be local with minimal side-effects.
What has generally worked for me is paraphrasing the old adage "Write the data structures and the code will follow" over to AI. Design your data, consider the design immutable and let the AI try fill in the necessary code (well, with some guidance). If it finds the data structures aren't enough, have it prompt you instead of making changes on its own. AI can do lot of the low-hanging fruit and often the harder ones as well as long as it's bound to something.
Yet, for now, AI at best has been something that relieves me from having to write a long string of boring code: it's not sustainable to keep developing stuff relying on AI alone. It's also great when quality is not an issue; for any serious work AI has not speeded me up noticeably. I still need to think through the hard parts, and whatever I gain in generating code I lose in managing the agents. But I can parallelise code generation, trying new approaches, and exploring out because AI is cheap. AI is also pretty good for going through the codebase and reasoning about dependencies whether in the context of adding a new feature or fixing a bug: I often let AI create a proof-of-concept change that does it, then I extract the important bits out of that and usually trim down the diffs down to at least 1/3 or less.
AI further helps with non-work, i.e. tasks that you have to do in order to fulfill external demands and requirements, and not strictly create anything solid and new. I can imagine AI creating various reports and summaries and documentation, perhaps mostly to be consumed and condensed by another AI at the receiving end. Sadly, all of this is mostly things not worth doing anyway.
Overall, I cringe under all the hype that's been laid on AI: it's a new tool that's still looking for its box or niche carveout, not a revolution.
Personally, I've taken the time its freed up to spend more time on mathacademy and reading more theory oriented books on data structures and algorithms. AI coding systems are at their best when paired with someone with broad knowledge. knowing what to ask for and knowing the vocabulary to be specific about what you want to be built is going to be a much more valuable job skill going forward.
One example is a small AI based learning system I have been developing in my free time to help me learn. the mvp stored an entire knowledge graph and progress in markdown files. being an engineer, I knew this wouldn't scale so once I proved the concept viable, I moved everything into sqlite with a graphdb. then I decided to wrap some parts of teh functionality in to rust and put everything behind a small rust layer with the progress tracking logic still being in python.
someone with no knowlege of graph databases or dependncy graphs or heuristics would not be able to build this even if they had AI. they simply don't know what they dont' know and AI wont' save you there.
That said, I think its important to also spend time in the dirt. I've recently started pickign up zig as my NO AI langauge just to keep. those skills sharp.
I'm really curious if we'll seesaw once AI costs go up 10x.
The quality gates are up to you, and if you are smart you will make a lot of them and review them closely
Can someone with more experience with it (or similar tools) chime in and confirm that this isn't just more AI snake oil? :)
Matt Pocock talks about specs and Openspec after 23:00 minute mark and again after 33:00 minute mark here: https://www.youtube.com/watch?v=-QFHIoCo-Ko. He doesn't believe in simply translating specs-to-code. He emphasizes tracer bullets, TDD, setting up quick feedback loops.
But I will say... you have to know Golang. You have to have at least tried to make a BubbleTea app yourself and try to understand ELM architecture. You have to look at the code and increment with it.
It makes total sense for OP to switch to Rust and Ratatui if they don't know Golang well. But I don't think it's a better language for it. [Ratatui has brought me great inspiration though!]
Independent of framework, the LLMs get the spacial relationships. I say things like "the upper right panel's content is not wrapping inside and the panel's right edge should extend to the terminal edge" and the LLM will fix it. They can see the resultant text, I'm copy-pasting all the time.
TUI code is finicky; one mis-rendered component mucks everything up. The LLMs will decide themselves make little, temporary BubbleTea fixtures to help understand for itself when things aren't right.
The only real problem with LLMs and BubbleTea is that upon first prompt, they insist on using BubbleaTea v1 versus BubbleTea v2, released in December 2025. But then you just point it to the V2_UPGRADE.md and it gets back on track. That will improve as training cutoffs expand.
I vibe-coded this TUI for Mom's last night. I actually started with Grok (who started with v1) and then moved into Claude Code after some iteration:
https://gist.github.com/neomantra/1008e7f2ad5119d3dd5716d52e...
I stopped reading after this, because this is the dumbest way to vibe code anything larger than a single-use tool.
Claude is a collaborator, and honestly a decent voice of dissent, but it will never offer that unprompted. "Make this thing" - "OK".
You need to review the code. You need to say "I want this, AND HERE IS THE LONG-TERM VISION. Now offer critique and the trade-offs for various implementations."
Or just realize that in every hand-written project you learn the contours of the problem space as you go along and if the tool is big enough you'll feel the urge to do a green-field rewrite of hand-rolled code after a few years. You get there quicker with the robot's help. This is not a new lesson.
There's a massive difference in good human "writin" and a dozen paragraphs of "it's not x, it's a y".
But unfortunately everyone "reads" English. So, at least devs have mysterious computer languages that have strings of numbers that most of us look at and immediately get a migrain from attempting to comprehend what it means.
keep up the good work and the craft of building things one keystroke at a time.
Software engineering is not that. You absolutely can and often will hand ofoff work to humans. Its not inherently that creative in the actual coding part.
Also 1600 lines... didn't any agent reviewing the diffs point that out?
You're also adding a lot to claude.md, I dunno how much that file has grown but a big claude.md file with many instructions, I don't think the ai will be able to remember all those rules.
In my experience, no. These tools suck at refactoring, mostly choosing to add more code instead.
AI may default to mediocre and often somewhat buggy code unless you iterate because that is just what the vast majority of human written code that it has seen looks like. But the fact that he got away with not reviewing the code for so long to me proves the opposite of his conclusion.
1690 lines of code in one file is a walk in the park for SOTA models.
He can just say something like:
"Please review and create a refactoring plan and test suite. I found atrocious architectural decisions like numerous special cases and if statements rather than using abstractions properly. Make a few notes in comments and architecture.md to never do this again."
One could also argue that it was a better decision each time by the AI to just never do a refactor unless prompted because that increases the likelihood of something breaking and you want to do that after you verify the minimum code change actually functionally does what you want.
Also I bet you the headline is a lie. He basically admits it by saying he is writing the core structure of the next version by hand ahead of time, implying that he will generate the rest. So the title is a half-truth at best.
He's already 5k+ LOC into the rust rewrite...
Prompt for what you want. Get your feature working, then cut: reduce SLOC, refactor to remove duplication, update things to match existing patterns. You might do these instinctively, or maybe as-you-go, but that's just style. Having a dedicated pass works just as well.
The same thing goes for my code now that did when I wrote every line by hand: make it work, then make it good, then make it manageable. Manually that meant breaking things down into small blocks of individual diffs inside a PR (or splitting PRs), checking for repetitive code and refactoring, or even stashing what I got to and doing it again with the knowledge of how things went wrong.
Agents can do the same. It's WAY easier mentally and works out better if you treat them the same way and go working -> better -> done.
The very worst things you can do in a codebase are (a) not deeply understand how it works (have it be magic) and (b) be lazy and mess up the structure.
How do you fix a problem which happens at 2:00am and takes your system down if you don't have an excellent understanding of how it works?
Over time we're already bad at (a) because most developers hate writing documentation so that knowledge is invariably lost over time.
Claude is super good as making it seem like it’s an expert in kubernetes, but then undercovering certain decisions, it’s basically optimizing to try to make things look like they work.
An example is, i wanted to develop a feature to easily fork a managed Postgres database with a k8s cluster. The thing it did was to copy the entirety of the source db to localhost, then copy it back out to the cluster, rather than just running the job within the cluster.
Now I’m pretty stressed after a 1 hour vibe coding session, having to now review and digest and think through the code that it wrote. Implementations like that scare me — if I accidentally missed it and merged it — since there are real people who rely on canine.
I wouldn’t go as far as to say I’m writing everything by hand, but I now always map out how I would do something before asking ai to approach it
I have found small iterations to have the best results. I'm not giving AI any chance to one shot it. For example, I won't tell it to "create a fleet view" but something more like "extract key binding to a service" so that I can reuse it in another view before adding another view. Basically, talk to the AI as an engineer talking to another engineer at the nitty gritty level that we need to deal with everyday, not a product person wishing for a business selling point to magically happen.
For example, if I'm new to programming today and I'm not part of any community that necessarily approves agentic coding or disapproves of vibe coding and I heard that C programs run fast as heck and I heard that I can automate jobs 1,2 and 3 with such a program, I generate said program and it works as expected per my limited experience then what's the issue?
Perhaps in a couple of weeks I notice I'm missing 1/4 of my HD space and I figure out probably via an agent that my cool C program is creating bloat through caching or creating hidden dot files, so I agentically/vibe-ally generate a patch. Maybe this encourages me to join a community of other amateurs or a pro-am community where I learn specifics - eg. the exact bug(s) in my code -- as well as metas -- eg. testing.
There will probably be millions and millions of people generating code for their own purposes thanks to LLMs, and the number grows as the technology develops and becomes more trivial. So I wonder how much value there is in the "how to think about this" discussion vs the "how to use this" discussion. It almost feels like religious encampments are forming over a false -- possibly manufactured -- lines of division
With that said, this caught my eye:
> AI gravitates toward single-struct-holds-everything because it satisfies the immediate prompt with minimal ceremony.
This is too general. "AI" is used here as a catch-all, but in fact, it was the specific model under the specific conditions you ran your prompt, including harness, markdowns, PRDs, etc. So it's not fair to say "AI does X!" in this case.
It's also very much up to you. It's very common to have a frontier model plan an architecture before you have another model implement code. If you're just one-shotting an LLM to do everything you get mediocre, more brittle code.
This stuff is still being figured out by a lot of people. But I feel the core of the issue is not using AI well. Scoping, task alignment, validation, are crucial.
I still do, but I used to, too.
And in a couple of months we might be doing things completely differently because of some new model or new framework.
That's really cool.
The framework could be an isolation later against viberod but not sure if its necessary for my small project i always wanted to do and never done anything with it.
For another tool, i will try another approach: Start with a deep investigation and spec write together with AI, than starting with the core architecture layout and than adding features.
So instead of just prompting "write a golang project with a http server serving xy, and these top 3 features" i will prompt "create a basic golang scarfold for build and test" -> "create a basic http server with a basic library doing xy" -> "define api spec" -> "write feature x"
There is kind a skill and depth to vibe coding though.
Will that improve or get worse? One would argue that LLMs in general are drastically more competent now than they were a couple years ago, they’re also much better at coding. We’re likely just now entering the era where they can code but are still not what you’d fully expect, or at least not what someone with absolutely no coding knowledge could use to code at the same level as someone who does know how to code.
Maybe that changes as the models improve, maybe it doesn’t, only time will tell.
I really do think this whole thing is a wash.
AI was also able to help me create my first subscription payment workflow.
It is like farming without Roundup, less crops, more energy, less toxic chemical risks.
Also 1600 lines... didn't any agent reviewing the diffs point that out?
You're also adding a lot to claude.md, I dunno how much that file has grown but a big claude.md file with many instructions, I don't think the ai will be able to remember all those rules
But again, if you just guide the AI on architecture and review the code, you should be fine. The code that you write and the code that an AI writes are two different things; they will never be the same.
The AI is very helpful for generating code, and that is exactly how you should use it: as a code generator.
Actually I am curikus to try somwthing like that myself. Is there an existing orchestrating engine (or single agent) which can spawn multiple subagents and keep passing their feedback/output between each other until all of them agree that assignment overall is complete?
Do they write empty functions and let AI fill them in?
Or do they use some kind of specification language?
Are people designing those languages?
I see this in Claude too, but I also see this in junior engineers. In the case with Claude, I simply ask it to refactor immediately after each feature is done. The human is still responsible for the AI writes, so if the AI writes code that’s gross, I would never push that lest it sully my name and my reputation for my own code quality.
If there's any hope for reliability, auditability, predictability to be had it lies in contraining and LlMs grammar whilst delegating freeform behavior to a more passive substrate.
Looking at the code, paying attention to the structure is part of the skill
The skills required to wield an an LLM are not exactly those required to write code, but are very close.
"Vibecoding" is not a way for idiots to blindly produce software artifacts that anyone would want
The problem with this dev's approach is not AI, it's their use of it. They didn't ensure that the architecture made sense. They didn't look at the code and get a "feel" for it. They didn't do the whole build stuff, step back, refactor, rinse and repeat dance. The need for that hasn't gone away; if anything, it's even more important now. Because you can spit out code 100x faster than you could before, your tech debt compounds 100x faster. The earlier you refactor, the less work it is.
I usually give the agent a solid idea of what I want, often down to the API interfaces. Then every now and then, I'll go through the code and ensure that everything makes sense, and that I'm not just spitting out code that works, but building a codebase that scales.
If you understand good software architecture, architect it. Create a markdown document just as you would if you had a team of engineers working with you and would hand off to them. Be specific.
Let the AI do the implementation of your architecture.
That trial and error process is still happening with a LLM, but much faster, and with instantaneous cross-references to various forms of documentation that I would be looking up myself otherwise. It produces code of a quality that is dependent on the engineer knowing what they want in the first place and prompting for it and refining its output correctly.
It's the exact same process of sculpting code that the majority of the industry was doing "by hand" prior to the release of LLMs, but faster, and the harnesses are only getting better. To "vibe code" is to prompt vaguely and ignore the quality of the output. You're coming to a forum full of professionals and essentially telling us that you're getting really frustrated with your Scratch project.
I don't know if you're trying to lead a charge or whatever but good luck with that. As a senior SWE, it is clear to me that this is the new paradigm until something better than LLMs comes along. My workflows and efficiency have been vastly improved. I will admit that I have never really been a "I made a SMTP server in 3k of Rust" kind of guy, though.
7 months ago was early November. Coding assistants were getting very good back then, but they were still significantly poorer at making good architectural decisions in my experience. They tended to just force features into the existing code base without much thought or care.
Today I've noticed assistants tend to spot architectural smells while working and will ask you whether they should try to address it, but even then they're probably never going to suggest a full refactor of the codebase (which probably is generally the correct heuristic).
My guess is that if you built this today with AI that you wouldn't run into so many of these problems. That's not to say you should build blind, but the first thing that stood out to me was that you starting building 7 months ago and coding assistants were only just becoming decent at that time, and undirected would still generally generate total slop.
> For 7 months I'd been prompting and shipping without ever sitting down and actually reading the code Claude wrote.
But every time I read something like this, I seriously wonder about the mental state of the person that wrote it.
How do you get to this point?
BASIC at that time was heralded as a much simpler and faster way to program. Rings a bell?
But in my main work, reverse engineering, LLMs are godsend, for years now.
You can basically bruteforce binary obfuscation thanks to them. And thanks to eager chinese LLM providers, basically for free.
But I always use LLM only for boring work and rest is for me to do manually, or with scripts of course, but made by me. Because I want to learn.
Yes, there are a lot people using LLMs for full RE automation since they're selling exploits for profit. No problem with me.
I see funny future for huge corporations like Adobe, etc.
Imagine prompt, "Hey Claude, re-implement Adobe Photoshop with clean-room design" One agent will open decompiler, outputs complete low level technical details how is everything implemented.
Second agent implements new Photoshop based on that.
They will be mad and I like this.
You will own nothing, and you will be happy, corpos.
I feel the same way about coding, its a source of pride for me and when I hear people say I should resign myself to being an "ideas guy" while chatgpt actually creates things I find the very concept to be distasteful regardless of whether or not it can outperform me.
It would have been easy to run a few ai agents to review the code and find these issues as well and architect it clearly
clickbait title
This. I definitely agree with this statement at this point in AI-assisted development. This gets at the "taste" factor that is still intrinsically human, especially in software engineering. If you can construct and guide the overall architecture of an application or system, AI can conceivably fill in the smaller feature bits, and do so well. But it must have a strong architecture and opinionated field in which to play.
But here's the thing, you almost never know what the architecture is up front. If you do you probably aren't the one writing the actual code anymore. Writing the code, with or without an AI is part of the design process. For most people it isn't until they've tried several times, fucked it up a bunch, and refactored or rewrote even more that you actually know what the architecture needs to be.
Now I do feel lucky that I started learning coding about four years before the LLM revolution, but these things are really just natural language compilers, aren’t they? We’re just in that period - the 1980s, the greybeards tell me - where companies charged thousands of dollars per compiler instance, right? And now, I myself have never paid for a compiler.
This whole investor bubble will blow up in the face of the rentier-finance capitalists and I’ll be laughing my head off while it happens.
I dont go as fast as with other agents, but this works for me, and I enjoy the process.
This is what I was doing right from the beginning. AI just fills out methods and doing other low intelligence work. Both are happy. My architectures and code are really mine, easy to read and reason. AI gets paid and does not get a chance to fuck me in the process. At no point I felt any temptation to leave "serious" to AI.
Getting a plan isn't a panacea but is a better way to limit downstream slop than just vibing without one.
The ones who are “AI pilled” and the contagious lepers.
some states, for an example, are meant to be assumed from the data shape, rather than the actual state fields, but damn they like adding a state field.
Yea, that's why engineers are still very important for now (until models can do this type of longer term designs and stick to them).
Attempting anything comprehensive with AI is the software development analogue to the Gell-Mann Amnesia effect..
I'm definitely thinking deeply now about how I'm approaching these tools going forward.. Yes, GPT5 is better at spitting out a fairly acceptable skeleton to a class when prompted hard enough, than I am, in one go.. but.. It will happily do things like write decent looking protobuf schemas and then go ahead and hide everything that takes the least amount of reasoning behind some binary blob nested deep enough that it'll get past even the most dedicated reviewer..
It's fairly good at a lot of the things that I don't find interesting to deal with, but it's also amazingly incompetent when it comes to even the most mundane kind of common sense.. It's so strongly steering towards text-book examples that it will happily put in three times the amount of code and handle multiple classes of actually impossible edge-cases and even use-cases that it was specifically asked NOT to add.. And it will defend it by "well, I added this because I can't know if someone is going to use the thing I just added.. well, if you hadn't added it, chances are indeed slimmer..
It's so good at answering questions and explaining what's there, and diving through call-paths, and yet, it drops the ball the moment it's going to actually do something beyond saving me from looking up how write some really annoying and uninteresting boilerplate..
The worst thing is how good it is at making things LOOK right, it will cover every single edge-case you throw at it, but not because of the design, not because it correctly argues why the architecture is inherently allowing such and such, or because the design and spec fleshes out that A goes to B and never the other way around, and as soon as it's time to make something, it will make sure B can go to A, especially, it seems, if allowing so prevents it from doing the right thing which is WHY those edge-cases were trivial, instead it will endlessly hack around them.. I've worked people like that too, so I don't know if I am really blaming the models or the training data..
But damn it's a tough spot..
I've had multiple situations where, after wasting hours of work, which I should have just spend doing it myself, the only thing I really wished was for the model to be sentient, and able to feel pain, and have a corporal body so I could drag it outside and beat it to a pulp. (I've never reached that level of frustration with an actual person, so that's something new they bring to the table..)
Time to become a "product engineer" and watch the hyper-agile agents putting up digital post-it notes on digital pin-boards discussing how much each post-it is worth in digital scrum meetings. Meanwhile the agents keep wasting more and more time so that their owners make less and less of a lose, until eventually a profit is made.
Until the costs become prohibitive and humans become cheaper than the agents that replaced them. Once the agents are replaced by the humans, the next hype bubble awaits around the bend.
/s
Inb4 “you’re gonna be replaced” god damn it I hope so, I do not want to spend the rest of my life behind a computer screen…