This experience is familiar to every serious software engineer who has used AI code gen and then reviewed the output:
> But when I reviewed the codebase in detail in late January, the downside was obvious: the codebase was complete spaghetti14. I didn’t understand large parts of the Python source extraction pipeline, functions were scattered in random files without a clear shape, and a few files had grown to several thousand lines. It was extremely fragile; it solved the immediate problem but it was never going to cope with my larger vision,
Some people never get to the part where they review the code. They go straight to their LinkedIn or blog and start writing (or having ChatGPT write) posts about how manual coding is dead and they’re done writing code by hand forever.
Some people review the code and declare it unusable garbage, then also go to their social media and post how AI coding is completely useless and they’re not going to use it for anything.
This blog post shows the journey that anyone not in one of those two vocal minorities is going through right now: A realization that AI coding tools can be a large accelerator but you need to learn how to use them correctly in your workflow and you need to remain involved in the code. It’s not as clickbaity as the extreme takes that get posted all the time. It’s a little disappointing to read the part where they said hard work was still required. It is a realistic and balanced take on the state of AI coding, though.
I’ve been driving Claude as my primary coding interface the last three months at my job. Other than a different domain, I feel like I could have written this exact article.
The project I’m on started as a vibe-coded prototype that quickly got promoted to a production service we sell.
I’ve had to build the mental model after the fact, while refactoring and ripping out large chunks of nonsense or dead code.
But the product wouldn’t exist without that quick and dirty prototype, and I can use Claude as a goddamned chainsaw to clean up.
On Friday, I finally added a type checker pre-commit hook and fixed the 90 existing errors (properly, no type ignores) in ~2 hours. I tried full-agentic first, and it failed miserably, then I went through error by error with Claude, we tightened up some exiting types, fixed some clunky abstractions, and got a nice, clean result.
AI-assisted coding is amazing, but IMO for production code there’s no substitute for human review and guidance.
Then use ideation to architect, dive into details and tell the AI exactly what your choices are, how certain methods should be called, how logging and observability should be setup, what language to use, type checking, coding style (configure ruthless linting and formatting before you write a single line of code), what testing methodology, framework, unit, integration, e2e. Database, changes you will handle migrations, as much as possible so the AI is as confined as possible to how you would do it.
Then, create a plan file, have it manage it like a task list, and implement in parts, before starting it needs to present you a plan, in it you will notice it will make mistakes, misunderstand some things that you may me didn’t clarify before, or it will just forget. You add to AGENTS.md or whatever, make changes to the ai’s plan, tell it to update the plan.md and when satisfied, proceed.
After done, review the code. You will notice there is always something to fix. Hardcoded variables, a sql migration with seed data that should actually not be a migration, just generally crazy stuff.
The worst is that the AI is always very loose on requirements. You will notice all its fields are nullable, records have little to no validation, you report an error when testing and it tried to solve it with an brittle async solution, like LISTEN/NOTIFY or a callback instead of doing the architecturally correct solution. Things that at scale are hell to debug, especially if you did not write the code.
If you do this and iterate you will gradually end up with a solid harness and you will need to review less.
Then port it to other projects.
For that I usually get it reviewed by LLMs first, before reviewing it myself.
Same model, but clean session, different models from different providers. And multiple (at least 2) automated rounds of review -> triage by the implementing session -> addressing + reasons for deferring / ignoring deferred / ignored feedbacks -> review -> triage by the implementing session -> …
Works wonders.
Committing the initial spec / plan also helps the reviewers compare the actual implementation to what was planned. Didn’t expect it, but it’s worked nicely.
Sounds like a solid way to make crud web apps though.
If you set up restrictive linters and don't explicitly prohibit agents from adding inline allows, most LOC will be allow comments.
Based on this learning, I've decided to prohibit any inline allows. And then agents started doing very questionable things to satisfy clippy.
Recent example:
- Claude set up a test support module so that it could reuse things. Since this was not used in all tests, rust complained about dead_code. Instead of making it work, claude decided to remove test support module and just... blow up each test.
If you enable thinking summaries, you'll always see agent saying something like: "I need to be pragmatic", which is the right choice 50% of the time.
But it's not all bad news. TIL about Parameters<T>.
This should be done on day one with a company-wide skill or project template that defines hard limits and processes for the Agent.
Strict linters, formatters and code quality checks are essential to de-slopify the code as much as possible.
That doesn't fix bad design though, that's still on humans.
Personally, I think it's just the natural flow when you're starting out. If he keeps going, his opinion is going to change and as he gets to know it better, he'll likely go more and more towards vibecoding again.
It's hard to say why, but you get better at it. Even if it's really hard to really put into words why
I think "more and more" is doing some very heavy lifting here. On the surface it reads like "a lot" to many people, I think, which is why this is hard to read without cringing a bit. Read like that it comes off as "It's very addictive and eventually you get lulled into accepting nonsense again, except I haven't realized that's what's happening".
But the truth is that this comment really relies entirely on what "more and more" means here.
One thing I will add: I actually don’t think it’s wrong to start out building a vibe coded spaghetti mess for a project like this… provided you see it as a prototype you’re going to learn from and then throw away. A throwaway prototype is immensely useful because it helps you figure out what you want to build in the first place, before you step down a level and focus on closely guiding the agent to actually build it.
The author’s mistake was that he thought the horrible prototype would evolve into the real thing. Of course it could not. But I suspect that the author’s final results when he did start afresh and build with closer attention to architecture were much better because he has learned more about the requirements for what he wanted to build from that first attempt.
Professional software engineers like many of us have a big blind spot when it comes to AI coding, and that's a fixation on code quality.
It makes sense to focus on code quality. We're not wrong. After all, we've spent our entire careers in the code. Bad code quality slows us down and makes things slow/insecure/unreliable/etc for end users.
However, code quality is becoming less and less relevant in the age of AI coding, and to ignore that is to have our heads stuck in the sand. Just because we don't like it doesn't mean it's not true.
There are two forces contributing to this: (1) more people coding smaller apps, and (2) improvements in coding models and agentic tools.
We are increasingly moving toward a world where people who aren't sophisticated programmers are "building" their own apps with a user base of just one person. In many cases, these apps are simple and effective and come without the bloat that larger software suites have subjected users to for years. The code is simple, and even when it's not, nobody will ever have to maintain it, so it doesn't matter. Some apps will be unreliable, some will get hacked, some will be slow and inefficient, and it won't matter. This trend will continue to grow.
At the same time, technology is improving, and the AI is increasingly good at designing and architecting software. We are in the very earliest months of AI actually being somewhat competent at this. It's unlikely that it will plateau and stop improving. And even when it finally does, if such a point comes, there will still be many years of improvements in tooling, as humanity's ability to make effective use of a technology always lags far behind the invention of the technology itself.
So I'm right there with you in being annoyed by all the hype and exaggerated claims. But the "truth" about AI-assisted coding is changing every year, every quarter, every month. It's only trending in one direction. And it isn't going to stop.
Strongly disagree with this thesis, and in fact I'd go completely the opposite: code quality is more important than ever thanks to AI.
LLM-assisted coding is most successful in codebases with attributes strongly associated with high code quality: predictable patterns, well-named variables, use of a type system, no global mutable state, very low mutability in general, etc.
I'm using AI on a pretty shitty legacy area of a Python codebase right now (like, literally right now, Claude is running while I type this) and it's struggling for the same reason a human would struggle. What are the columns in this DataFrame? Who knows, because the dataframe is getting mutated depending on the function calls! Oh yeah and someone thought they could be "clever" and assemble function names via strings and dynamically call them to save a few lines of code, awesome! An LLM is going to struggle deciphering this disasterpiece, same as anyone.
Meanwhile for newer areas of the code with strict typing and a sensible architecture, Claude will usually just one-shot whatever I ask.
edit: I see most replies are saying basically the same thing here, which is an indicator.
That's all very true, but what you're missing is that the proportion of codebases that need this is shrinking relative to the total number of codebases. There's an incredible proliferation of very small, bespoke, simple, AI-coded apps, that are nonetheless quite useful. Most are being created by people who have never written a line of code in their life, who will do no maintenance, and who will not give two craps how the code looks, any more than the average YouTuber cares about the aperture of their lens or the average forum commenter care about the style of their prose.
We don't see these apps because we're professional software engineers working on the other stuff. But we're rapidly approaching a world where more and more software is created by non-professionals.
Your example with the dataframes is completely unstructured mutation typical of a dynamic language and its sensibilities.
I know from experience that none of the modern models (even cheap ones) have issues dealing with global or near-global state and mutating it, even navigating mutexes/mutices, conds, and so on.
It actually becomes more and more relevant. AI constantly needs to reread its own code and fit it into its limited context, in order to take it as a reference for writing out new stuff. This means that every single code smell, and every instance of needless code bloat, actually becomes a grievous hazard to further progress. Arguably, you should in fact be quite obsessed about refactoring and cleaning up what the AI has come up with, even more so than if you were coding purely for humans.
Strong disagree. I just watched a team spend weeks trying to make a piece of code work with AI because the vibe coded was spaghetti garbage that even the AI couldn’t tell what needed to be done and was basically playing ineffective whackamole - it would fix the bug you ask it by reintroducing an old bug or introducing a new bug because no one understood what was happening. And humans couldn’t even step in like normal because no one understood what’s going on.
In 1998, I'm sure there were newspaper companies who failed at transitioning online, didn't get any web traffic, had unreliable servers crashed, etc. This says very little about what life would be like for the newspaper industry in 1999, 2000, 2005, 2010, and beyond.
Spaghetti code is still spaghetti code. Something that should be a small change ends up touching multiple parts of the codebase. Not only does this increase costs, it just compounds the next time you need to change this feature.
I don't see why this would be a reality that anyone wants. Why would you want an agent going in circles, burning money and eventually finding the answer, if simpler code could get it there faster and cheaper?
Maybe one day it'll change. Maybe there will be a new AI technology which shakes up the whole way we do it. But if the architecture of LLMs stays as it is, I don't see why you wouldn't want to make efficient use of the context window.
I said that (a) apps are getting simpler and smaller in scope and so their code quality matters less, and (b) AI is getting better at writing good code.
Now I’m being told code quality doesn’t matter at all.
I completely agree. Just going through the beginner & hobbyist forums, the change from "can you help me with code to do X" to "I used ChatGPT/Claude/Copilot to write code to do X" happened with absolutely startling speed, and it's not slowing down. There was clearly a pent-up demand here that wasn't being met otherwise.
People are using AI to get code written. They have no idea what code quality is and only care that what they built works.
AFAICT, every time technology has allowed non-technical people to do more, it's opened up new opportunities for programmers. I don't expect this to be any different, I just want to know where the opportunities are.
It's the opposite, code quality is becoming more and more relevant. Before now you could only neglect quality for so long before the time to implement any change became so long as to completely stall out a project.
That's still true, the only thing AI has changed is it's let you charge further and further into technical debt before you see the problems. But now instead of the problems being a gradual ramp up it's a cliff, the moment you hit the point where the current crop of models can't operate on it effectively any more you're completely lost.
> We are in the very earliest months of AI actually being somewhat competent at this. It's unlikely that it will plateau and stop improving.
We hit the plateau on model improvement a few years back. We've only continued to see any improvement at all because of the exponential increase of money poured into it.
> It's only trending in one direction. And it isn't going to stop.
Sure it can. When the bubble pops there will be a question: is using an agent cost effective? Even if you think it is at $200/month/user, we'll see how that holds up once the cost skyrockets after OpenAI and Anthropic run out of money to burn and their investors want some returns.
Think about it this way: If your job survived the popularity of offshoring to engineers paid 10% of your salary, why would AI tooling kill it?
What you're missing is that fewer and fewer projects are going to need a ton of technical depth.
I have friends who'd never written a line of code in their lives who now use multiple simple vibe-coded apps at work daily.
> We hit the plateau on model improvement a few years back. We've only continued to see any improvement at all because of the exponential increase of money poured into it.
The genie is out of the bottle. Humanity is not going to stop pouring more and more money into AI.
> Sure it can. When the bubble pops there will be a question: is using an agent cost effective? Even if you think it is at $200/month/user, we'll see how that holds up once the cost skyrockets after OpenAI and Anthropic run out of money to burn and their investors want some returns.
The AI bubble isn't going to pop. This is like saying the internet bubble is going to pop in 1999. Maybe you will be right about short term economic trends, but the underlying technology is here to stay and will only trend in one direction: better, cheaper, faster, more available, more widely adopted, etc.
I'm curious about software that's actively used but nobody maintains it. If it's a personal anecdote, that's fine as well
> However, code quality is becoming less and less relevant in the age of AI coding, and to ignore that is to have our heads stuck in the sand. Just because we don't like it doesn't mean it's not true.
> [...]
> We are increasingly moving toward a world where people who aren't sophisticated programmers are "building" their own apps with a user base of just one person. In many cases, these apps are simple and effective and come without the bloat that larger software suites have subjected users to for years. The code is simple, and even when it's not, nobody will ever have to maintain it, so it doesn't matter. Some apps will be unreliable, some will get hacked, some will be slow and inefficient, and it won't matter. This trend will continue to grow.
I do agree with the fact that more and more people are going to take advantage of agentic coding to write their own tools/apps to maker their life easier.
And I genuinely see it as a good thing: computers were always supposed to make our lives easier.But I don't see how it can be used as an argument for "code quality is becoming less and less relevant".
If AI is producing 10 times more lines that are necessary to achieve the goal, that's more resources used. With the prices of RAM and SSD skyrocketing, I don't see it as a positive for regular users. If they need to buy a new computer to run their vibecoded app, are they really reaping the benefits?
But what's more concerning to me is: where do we draw the line?
Let's say it's fine to have a garbage vibecoded app running only on its "creator" computer. Even if it gobbles gigabytes of RAM and is absolutely not secured. Good.
But then, if "code quality is becoming less and less relevant", does this also applies to public/professional apps?
In our modern societies we HAVE to use dozens of software everyday, whether we want it or not, whether we actually directly interact with them or not.
Are you okay with your power company cutting power because their vibecoded monitoring software mistakenly thought you didn't paid your bills?
Are you okay with an autonomous car driving over your kid because its vibecoded software didn't saw them?
Are you okay with cops coming to your door at 5AM because a vibecoded tool reported you as a terrorist?
Personally, I'm not.
People can produce all the trash they want on their own hardware. But I don't want my life to be ruled by software that were not given the required quality controls they must have had.
I mean, I agree, but you could say this at any point in time throughout history. An engineer from the 1960s engineer could scoff at the web and the explosion in the number of progress and the decline in efficiency of the average program.
An artist from the 1700s would scoff at the lack of training and precision of the average artist/designer from today, because the explosion in numbers has certain translated to a decline in the average quality of art.
A film producer from the 1940s would scoff at the lack of quality of the average YouTuber's videography skills. But we still have millions of YouTubers and they're racking up trillions of views.
Etc.
To me, the chief lesson is that when we democratize technology and put it in the hands of more people, the tradeoff in quality is something that society is ready to accept. Whether this is depressing (bc less quality) or empowering (bc more people) is a matter of perspective.
We're entering a world where FAR more people will be able to casually create and edit the software they want to see. It's going to be a messier world for sure. And that bothers us as engineers. But just because something bothers us doesn't mean it bothers the rest of the world.
> But then, if "code quality is becoming less and less relevant", does this also applies to public/professional apps?
No, I think these will always have a higher bar for reliability and security. But even in our pre-vibe coded era, how many massive brandname companies have had outages and hacks and shitty UIs? Our tolerance for these things is quite high.
Of course the bigger more visible and important applications will be the slowest to adopt risky tech and will have more guardrails up. That's a good thing.
But it's still just a matter of time, especially as the tools improve and get better at writing code that's less wasteful, more secure, etc. And as our skills improve, and we get better at using AI.
What’s really happening is that you’re all of those people in the beginning. Those people are you as you go through the experience. You’re excited after seeing it do the impossible and in later instances you’re critical of the imperfections. It’s like the stages of grief, a sort of Kübler-Ross model for AI.
But that's boring nerd shit and LLMs didn't change who thinks boring nerd shit is boring or cool.
Some people do find it unfun, saying it deprives them of the happy "flow" of banging out code. Reaching "flow" when prompting LLMs arguably requires a somewhat deeper understanding of them as a proper technical tool, as opposed to a complete black box, or worse, a crystal ball.
I use LLMs in my every day work. I’m also a strong critic of LLMs and absolutely loathe the hype cycle around them.
I have done some really cool things with copilot and Claude and I keep sharing them to within my working circle because I simply don’t want to interact that much with people who aren’t grounded on the subject.
SWEs spend 20% of the time writing code for exactly the same reason brick-layers spend 20% of their time laying bricks
I kinda like how you can just use it for anything you like. I have bazillion personal projects, I can now get help with, polish up, simplify, or build UI for, and it's nice. Anything from reverse engineering, to data extraction, to playing with FPGAs, is just so much less tedious and I can focus on the fun parts.
The AI’s are more than capable of producing a mountain of docs from which to rebuild, sanely. They’re really not that capable - without a lot of human pain - of making a shit codebase good.
I appreciate the balanced takes and also the notion that one can use these AI tools to build software with principled use.
However, what I am still failing to see is concrete evidence that this is all faster and cheaper than just a human learning and doing everything themself or with a small team. The cat is out of the bag, so to speak, but I think it's still correct to question these things. I am putting in a _lot_ of work to reach a principled status quo with these tools, and it is still quite unclear whether it's actually improvement versus just a side quest to wrangle tools that everyone else is abusing.
Previously, takes were necessarily shallower or not as insightful ("worked with caveats for me, ymmv") - there just wasn't enough data - although a few have posted fairly balanced takes (@mitsuhiko for example).
I don't think we've seen the last of hypers and doomers though.
Ironically this itself is one of the hyper/doomer takes.
I recently had to rewrite a part of such a prototype that had 15 years of development on it, which was a massive headache. One of the most useful things I used LLMs for was asking it to compare the rewritten functionality with the old one, and find potential differences. While I was busy refactoring and redesigning the underlying architecture, I then sometimes was pinged by the LLM to investigate a potential difference. It sometimes included false positives, but it did help me spot small details that otherwise would have taken quite a while of debugging.
But, the errors that are described - no architecture adhesion, lack of comprehension, random files, etc. are a matter of not leveling up the sophistication of use further, not a gap in those tools.
As an example. Very clearly laying out your architecture principles, guidance, how code should look on disk, theory on imports, etc. And then - objectively analyzing any proposed change against those principles, converges toward sane and understandable.
We've been calling it adversarial testing across a number of dimensions - architecture, security, accessibility, among other things. Every pr gets automatically reviewed and scored based on these perspectives. If an adversary doesn't OK the PR, it doesn't get merged.
Being completely methodical about development really helps. obra/superpowers, for example, gets close but I think it overindexes on testing and doesn’t go far enough with design document templates, planning, code style guides, code reviews, and more.
Being methodical about it takes more time, but prevents a good bit of the tech debt.
Planning modes help, but they are similarly not methodical enough.
Scary AF
Is there evidence these groups are a minority? I mean, the OP sounds like they are taking the right approach but I suspect it requires both skill/experience and an open mind to take their approach.
Just because an approach has good use-cases doesn't mean those are going predominate.
If it generates the slop version in a week but it takes me 3 more weeks to clean it up, could I have I just done it right the first time myself in 4 weeks instead? How much money have I wasted in tokens?
In both cases, you feel super productive all the time, because you are constantly putting in instructions and getting massive amounts of output, and this feels like constant & fast progress. It's scary how easy it is to waste time on LLMs while not even realizing you are wasting time.
Soooooo....
As one who hasn't taken the plunge yet -- I'm basically retired, but have a couple of projects I might want to use AI for -- "time" is not always fungible with, or a good proxy for, either "effort" or "motivation"
> How much money have I wasted in tokens?
This, of course, may be a legitimate concern.
> If it generates the slop version in a week but it takes me 3 more weeks to clean it up, could I have I just done it right the first time myself in 4 weeks instead?
This likewise may be a legitimate concern, but sometimes the motivation for cleaning up a basically working piece of code is easier to find that the motivation for staring at a blank screen and trying to write that first function.
Cleaning up agent slop code by hand is also a miserable experience and makes me hate my job. I do it already because at $DAYJOB because my boss thinks “investing” in third worlders for pennies on the dollar and just giving them a Claude subscription will be better than investing in technical excellence and leadership. The ROI on this strategy is questionable at best, at least at my current job. Code Review by humans is still the bottleneck and delivering proper working features has not accelerated because they require much more iteration because of slop.
Would much rather spend the time making my own artisanal tradslop instead if it’s gonna take me the same amount of time anyway - at least it’s more enjoyable.
2. Make it pretty
3. Make it fast.
If you have something that's pretty and fast but doesn't work, is that useful?
But if you have something that works, but is ugly and slow - you can still use it and fix it while you dogfood it. Maybe figure out other ways to improve it beyond making the code look pretty.
I often see criticism towards projects that are AI-driven that assumes that codebase is crystalized in time, when in fact humans can keep iterating with AI on it until it is better. We don't expect an AI-less project to be perfect in 0.1.0, so why expect that from AI? I know the answer is that the marketing and Twitter/LinkedIn slop makes those claims, but it's more useful to see past the hype and investigate how to use these tools which are invariably here to stay
That's a big leap of faith and... kinda contradicts the article as I understood it.
My experience is entirely opposite (and matches my understanding of the article): vibing from the start makes you take orders of magnitude more time to perfect. AI is a multiplier as an assistant, but a divisor as an engineer.
I completely agree that this is the case right now, but I do wonder how long it will remain the case.
There is something at this point kind of surreal in the fact that you know everyday there will be this exact blog post and these exact comments.
Like, its been literal years and years and yall are still talking about the thing thats supposed to do other things. What are we even doing anymore? Is this dead internet? It boggles the mind we are still at this level of discourse frankly.
Love 'em hate 'em I don't care yall need to freaking get a grip! Like for the love god read a book, paint a picture! Do something else! This blog is just a journey to snooze town and we all must at some level know that. This feels like literal brain virus.
If AI needs to re-write everything from scratch everytime you make a design change, that may have some obvious inefficiencies and limitations to it but if it can also do that in a few hours or a week, is it really that bad comapred to months of stalling and excuses from devs trying to understand the work of someone before them who wasnt given enough time to make well documented clean code to begin with?
Like it is undoubtedly worse for hobby projects to rely on the AI output 100%, but im actually not so sure for commercial products. It'll be the same type of spagetti garbage everywhere. There will be patterns in its nonsense that over years people will start to get accustomed to. Youll have people specialize in cleaning up AI generated code, and itll more or less be a relatively consistent process compared to picking up random developer spaghetti
maybe this is a hot take though
This is my experience. Tests are perhaps the most challenging part of working with AI.
What’s especially awful is any refactor of existing shit code that does not have tests to begin with, and the feature is confusing or inappropriately and unknowingly used multiple places elsewhere.
AI will write test cases that the logic works at all (fine), but the behavior esp what’s covered in an integration test is just not covered at all.
I don’t have a great answer to this yet, especially because this has been most painful to me in a React app, where I don’t know testing best practices. But I’ve been eyeing up behavior driven development paired with spec driven development (AI) as a potential answer here.
Curious if anyone has an approach or framework for generating good tests
The tricky part of unit tests is coming up with creative mocks and ways to simulate various situations based on the input data, w/o touching the actual code.
For integration tests, it's massaging the test data and inputs to hit every edge case of an endpoint.
For e2e tests, it's massaging the data, finding selectors that aren't going to break every time the html is changed, and trying to winnow down to the important things to test - since exhaustive e2e tests need hours to run and are a full-time job to maintain. You want to test all the main flows, but also stuff like handling a back-end system failure - which doesn't get tested in smoke tests or normal user operations.
That's a ton of creativity for AI to handle. You pretty much have to tell it every test and how to build it.
Pull out as many pure functions as possible and exhaustively test the input and output mappings.
And finally, how do you address spec drift?
If some component doest benefit from being extensively tested, then it's still the same today. The difference is now it's so easy to generate something, no matter how useless it is. Worse part is, no one cares. Test passes, it doesn't affect production, line coverage increases, managers think the software is more tested, developers just let a prompt do everything. It's all just testing theatre.
I think E2E is the more important than ever. AI is pretty good at getting the local behaviour correct. So unit tests are of less value. Same can't be said for the system as a whole. The best part is, AI is actually pretty good at writing E2E tests. Ofc, given that you already know what you want to test
This is a great article. I’ve been trying to see how layered AI use can bridge this gap but the current models do seem to be lacking in the ambiguous design phase. They are amazing at the local execution phase.
Part of me thinks this is a reflection of software engineering as a whole. Most people are bad at design. Everyone usually gets better with repetition and experience. However, as there is never a right answer just a spectrum of tradeoffs, it seems difficult for the current models to replicate that part of the human process.
In one of the cases, I was searching for a way to extract a bunch of code that 5-6 queries had in common. Whatever this thing was, its parameters would have to include an array/tuple of IDs, and a parameter that would alter the table being selected from, neither of which is allowed in a clickhouse parameterized view. I could write a normal view for this, but performance would’ve been atrocious given ClickHouse’s ok-but-not-great query optimizer.
I asked AI for alternatives, and to discuss the pros and cons of each. I brought up specific scenarios and asked it how it thought the code would work. I asked it to bring what it knew about SQL’s relational algebra to find the an elegant solution.
It finally suggested a template (we’re using Go) to include another sql file, where the parameter is a _named relation_. It can be a CTE or a table, but it doesn’t matter as long as it has the right columns. Aside from poor tooling that doesn’t find things like typos, it’s been a huge win, much better than the duplication. And we have lots of tests that run against the real database to catch those typos.
Maybe this kind of thing exists out there already (if it does, tell me!) but I probably wouldn’t have found it.
This could likely be extracted much easier now from the new code, but imagine API docs or a mapping of the logical ruleset with interwoven commentary - other devtools could be built easily, bug analysis could be done on the structure of rules independent of code, optimizations could be determined on an architectural level, etc.
LLMs need humans to know what to build. If generating code becomes easy, codifying a flexible context or understanding becomes the goal that amplifies what can be generated without effort.
1) All-knowing oracle which is lightly prompted and develops whole applications from requirements specification to deployable artifacts. Superficial, little to no review of the code before running and committing.
2) An additional tool next to their already established toolset to be used inside or alongside their IDE. Each line gets read and reviewed. The tool needs to defend their choices and manual rework is common for anything from improving documentation to naming things all the way to architectural changes.
Obviously anything in between as well being viable. 1) seems like a crazy dead-end to me if you are looking to build a sustainable service or a fulfilling career.
> In theory, you can try to preserve this context by keeping specs and docs up to date. But there’s a reason we didn’t do this before AI: capturing implicit design decisions exhaustively is incredibly expensive and time-consuming to write down. AI can help draft these docs, but because there’s no way to automatically verify that it accurately captured what matters, a human still has to manually audit the result. And that’s still time-consuming.
I agree that it's time consuming and we don't have a good solution yet, but my guess is that a huge part of the next 3 years of iteration in the craft of Software Engineering is going to be creating tools and practices to make this possible. Especially as AIs get better at the actual writing of the code, the key failure mode for agentic coding is going to be the intent gap between what you asked for and what you wanted.
Oof, this hit very close to home. My workplace recently got, as a special promotion, unlimited access to a coding agents with free access to all the frontier models, for a limited period of time. I find it extremely hard to end my workday when I get into the "one more prompt" mindset, easily clocking 12-hour workdays without noticing.
I now have several projects going in languages that I've never used. I have a side project in Rust, and two Go projects. I have a few decades experience with backend development in Java, Kotlin (last ten years) and occasionally python. And some limited experience with a few other languages. I know how to structurer backend projects, what to look for, what needs testing, etc.
A lot of people would insist you need to review everything the AI generates. And that's very sensible. Except AI now generates code faster than I can review it. Our ability to review is now the bottleneck. And when stuff kind of works (evidenced by manual and automated testing), what's the right point to just say it's good enough? There are no easy answers here. But you do need to think about what an acceptable level of due diligence is. Vibe coding is basically the equivalent of blindly throwing something at the wall and seeing what sticks. Agentic engineering is on the opposite side of the spectrum.
I actually emphasize a lot of quality attributes in my prompts. The importance of good design, high cohesiveness, low coupling, SOLID principles, etc. Just asking for potential refactoring with an eye on that usually yields a few good opportunities. And then all you need to do is say "sounds good, lets do it". I get a little kick out of doing variations on silly prompts like that. "Make it so" is my favorite. Once you have a good plan, it doesn't really matter what you type.
I also ask critical questions about edge cases, testing the non happy path, hardening, concurrency, latency, throughput, etc. If you don't, AIs kind of default to taking short cuts, only focus on the happy path, or hallucinate that it's all fine, etc. But this doesn't necessarily require detailed reviews to find out. You can make the AI review code and produce detailed lists of everything that is wrong or could be improved. If there's something to be found, it will find it if you prompt it right.
There's an art to this. But I suspect that that too is going to be less work. A lot of this stuff boils down to evolving guardrails to do things right that otherwise go wrong. What if AIs start doing these things right by default? I think this is just going to get better and better.
I know not everybody is quite ready for this yet. But I'm working from the point of view that I won't be manually programming much professionally anymore.
So, I now pick stuff I know AIs supposedly do well (like Go) with good solid tool and library ecosystems. I can read it well enough; it's not a hard language and I've seen plenty other languages. But I'm clearly not going to micro manage a Go code base any time soon. The first time I did this, it was an experiment. I wanted to see how far I could push the notion. I actually gave it some thought and then I realized that if I was going to do this manually I would pick what I always pick. But I just wasn't planning to do this manually and it wasn't optimal for the situation. It just wasn't a valid choice anymore.
Then I repeated the experiment again on a bigger thing and I found that I could have a high level discussion about architectural choices well enough that it did not really slow me down much. The opposite actually. I just ask critical questions. I try to make sure to stick with mainstream stuff and not get boxed into unnecessary complexity. A few decades in this industry has given me a nose for that.
My lack of familiarity with the code base is so far not proving to be any issue. Early days, I know. But I'm generating an order of magnitude more code than I'll ever be able to review already and this is only going to escalate from here on. I don't see a reason for me to slow down. To be effective, I need to engineer at a macro level. I simply can't afford to micro manage code bases anymore. That means orchestrating good guard rails, tests, specifications, etc. and making sure those cover everything I care about. Precisely because I don't want to have to open an editor and start fixing things manually.
As for Rust, that was me not thinking about my prompt too hard and it had implemented something half decent by the time I realized so I just went with it. To be clear, this one is just a side project. So, I let it go (out of curiosity) and it seems to be fine as well. Apparently, I can do Rust now too. It's actually not a bad choice objectively and so far so good. The thing is, I can change my mind and redo the whole thing from scratch and it would not be that expensive if I had to.
I didn't have to review the code for understanding what Claude did, I reviewed it for verifying that it did what it had been told.
It's also nuts to me that he had to go back in later to build in tests and validation. The second there is an input able to be processed, you bet I have tests covering it. The second a UI is being rendered, I have Playwright taking screenshots (or gtksnapshot for my linux desktop tools).
I think people who are seeing issues at the integration phase of building complex apps are having that happen because they're not keeping the limited context in mind, and preempting those issues by telling their tools exactly how to bridge those gaps themselves.
I agree with you in theory but in my opinion, it doesn't work so well when you don't even know what exactly you are looking for at the start. Yes I knew I wanted a formatter, linter, parse but which language should those be written in, should they be one project or many, how the pieces should fit together, none of that was clear to me.
As I pointed out in the article, in these sort of "greenfield projects" I work a lot better with concrete prototypes and code in front of me I can dissect instead of trying to endlessly play with designs in my head.
> It's also nuts to me that he had to go back in later to build in tests and validation.
I think this is a little misleading. Yes I did do some testing retroactively (i.e. the upstream validation testing) but I was using TDD + verifying outputs immediately, even during the vibe coding phase. The problem as I point out is that this is not enough. Even when I had unit tests written at the same time as they code, they had lots of holes and over time, I kept hitting SQL statements which failed which the testing did not cover.
I'd really recommend separating prototyping work like that out into a pre-design phase. Do the prototypes and figure out the direction for the actual project, but then come back in with a clean repo and design docs built off the prototypes, for claude to work from. I started out using claude to refactor my old projects (or even my codex ones) before I realized it worked better starting fresh.
I think sometimes it silently decides that certain pieces of code or design are absolute constraints, and won't actually remove or change them unless you explicitly tell it to. Usually I run into this towards the end of implementation, when I'll see something I don't expect to and have to tell it to rip it out.
One example recently was an entire messaging queue (nats jetstream) docker image definition that was sitting in the deploy files unused, but claude didn't ever mention or care about it as it worked on those files; it just silently left it sitting there.
Another example was an auth-bypass setting I built in for local testing during prototyping, being not just left alone by Claude but actually propagated into other areas of the application (e.g. API) without asking.
90 percent of the things users want either A) dont exist or B) are impossible to find, install and run without being deeply technical.
These things dont need to scale, they dont need to be well designed. They are for the most part targeted, single user, single purpose, artifacts. They are migration scripts between services, they are quick and dirty tools that make bad UI and workflows less manual and more managable.
These are the use cases I am seeing from people OUTSIDE the tech sphere adopt AI coding for. It is what "non techies" are using things like open claw for. I have people who in the past would have been told "No, I will not fix your computer" talk to me excitedly about running cron jobs.
Not everything needs to be snap on quality, the bulk of end users are going to be happy with harbor freight quality because it is better than NO tools at all.
But it does a good job of countering the narrative you often see on LinkedIn, and to some extent on HN as well, where AI is portrayed as all-capable of developing enterprise software. If you spend any time in discussions hyping AI, you will have seen plenty of confident claims that traditional coding is dead and that AI will replace it soon. Posts like this is useful because it shows a more grounded reality.
> 90 percent of the things users want either A) dont exist or B) are impossible to find, install and run without being deeply technical. These things dont need to scale, they dont need to be well designed. They are for the most part targeted, single user, single purpose, artifacts.
Yes, that is a particular niche where AI can be applied effectively. But many AI proponents go much further and argue that AI is already capable of delivering complex, production-grade systems. They say, you don't need engineers anymore. They say, you only need product owners who can write down the spec. From what I have seen, that claim does not hold up and this article supports that view.
Many users may not be interested in scalability and maintainability... But for a number of us, including the OP and myself, the real question is whether AI can handle situations where scalability, maintainability and sound design DO actually matter. The OP does a good job of understanding this.
> But when I reviewed the codebase in detail in late January, the downside was obvious: the codebase was complete spaghetti...It was extremely fragile; it solved the immediate problem but it was never going to cope with my larger vision...I decided to throw away everything and start from scratch
This part was interesting to me as it lines up with Fred Brooks "throw one away" philosophy: "In most projects, the first system built is barely usable. Hence plan to throw one away; you will, anyhow."
As indicated by the experience, AI tools provide a much faster way of getting to that initial throw-away version. That's their bread and butter for where they shine.
Expecting AI tools to go directly to production quality is a fool's errand. This is the right way to use AI - get a quick implementation, see how it works and learn from it but then refactor and be opinionated about the design. It's similar to TDD's Red, Green, Refactor: write a failing test, get the test passing ASAP without worrying about code quality, refactor to make the code better and reliable.
In time, after this hype cycle has died down, we'll come to realize that this is the best way to make use of AI tools over the long run.
> When I had energy, I could write precise, well-scoped prompts and be genuinely productive. But when I was tired, my prompts became vague, the output got worse
This part also echoes my experience - when I know well what I want, I'm able to write more specific specifications and guide along the AI output. When I'm not as clear, the output is worse and I need to spend a lot more time figuring it out or re-prompting.
Everybody remembers that soundbite but nobody remembers that he changed his mind about it later and switched to advocating iterative refinement.
If we are all honest, it seems to be the case - most of the time:
- Refactoring (Sometimes starting again.. this is rarely starting from scratch as there would have been some insights and personal design decisions garnered from the previous experience) - Specificity (It is heavily influenced by energy which also is different depending time of day or on the individual level)
At the end of the day, it takes taste + experience of the user to make anything of notable complexity(architecture) with AI.(For now and the nearest future at least).
I find reading articles as this gives me a renewed sense of agency as a technologist and my growing list of passions.
A solid thank you to Lalit Maganti for sharing and the better HN community. I found a lot to steal reuse from the material/banter.
When I ported pikchr (also from the SQLite project) to Go, I first ported lemon, then the grammar, then supporting code.
I always meant to do the same for its SQL parser, but pikchr grammar is orders of magnitude simpler.
The problem comes from how SQLite's upstream parse.y works. Becuase it doesn't actually generate the parse tree, instead generating the bytecode directly, the intepretation of any node labelled "id" or "nm" is buried inside the source code behind many layers of functions. You can see for yourself by looking at SQLite's parse.y [2]
[1] https://github.com/LalitMaganti/syntaqlite/tree/main/syntaql... [2] https://sqlite.org/src/file?name=src/parse.y&ci=trunk
Also, nice work: this makes the world just a little nicer!
> Unfortunately, unlike many other languages, SQLite has no formal specification describing how it should be parsed. It doesn’t expose a stable API for its parser either. In fact, quite uniquely, in its implementation it doesn’t even build a parse tree at all9! The only reasonable approach left in my opinion is to carefully extract the relevant parts of SQLite’s source code and adapt it to build the parser I wanted
Did they made a proper problem research in the first place?
To be clear when I say "formal specification", I'm not just talking about the formal grammar rules but also how those interpreted in practice. Something closer to the ECMAScript specification (https://ecma-international.org/publications-and-standards/st...).
[1] https://github.com/LalitMaganti/syntaqlite/blob/93638c68f9a0...
It is really good for getting up to speed with frameworks and techniques though, like they mentioned.
I have several Open Source projects and wanted to refactor them for a decade. A week ago I sat down with Google Gemini and completely refactored three of my libraries. It has been an amazing experience.
What’s a game changer for me is the feedback loop. I can quickly validate or invalidate ideas, and land at an API I would enjoy to use.
I think my vibe-coding success also has to do with the problem being not that “novel” and prior art exists.
Nevertheless still impressive.
Vision, taste and good judgment are going to be the key skills for software developers from now on.
It also reduces my hesitation to get started with something I don't know the answer well enough yet. Time 'wasted' on vibe-coding felt less painful than time 'wasted' on heads-down manual coding down a rabbit hole.
- first used copy and paste in and out of Grok
- started using CLI tools e.g. Claude and OpenCode
- move up to using 3 and sometimes 4 agents at the same time
- considered going to the agents managing agents
- have settled on having LLMs build tools that are both deterministic, usable by humans and the agent, and also faster (b/c there is less "back and forth")
Honestly, it feels a LOT like when Kubernetes came out. e.g. you stopped running containers on a box using Docker Compose plus scripts/configs etc. Instead gave a large part of the operation to an "agent" (in this case k8s) that managed all of the details you didn't need to care about anymore.
I've also realized that while the LLMs can crank out code at a very high rate, someone still needs to make sure everything is running, debug issues etc. You could set up agents to monitor what the agents do but then you still end up with someone needing to keep an eye on everything. If anything, you need MORE people b/c now you can just keep spinning up new components etc.
Also, was in a discussion with one of the best developers I've ever worked with. It came down to the following point:
"Programming is rapidly becoming a hobby. Software engineering is becoming more important than ever."
> I paid for that with a total rewrite.
With so much waste and not a single example of the "brilliant at giving you the right answer to a specific technical question"
> The takeaway for me is simple: AI is an incredible force multiplier
Seems more like a feel multiplier, rather than force.
> 500 tests, many of which I felt I could reuse
Indeed, feeling is the only saving grace for a mountain of random unreviewed tests
In my opinion, "giving me a better understanding for the architecture of the project" is reasonable technical compensation.
> Indeed, feeling is the only saving grace for a mountain of random unreviewed tests
I think I say a line or two above that this was after a review of the codebase so I did review these tests.
> giving me better understanding
Examples of that would also be nice (I don't doubt the personal feel that waste was justified)
> JOURNAL before: ...
> JOURNAL after: ... > was wrong here, learned this
Essentially ~all of the tests were found to be useful but in a more "smoke test" capacity i.e. they provided good "basic" coverage but it was clear that it was also not sufficient.
Which is why in the rewrite: 1) I built a TCL driver that run the upstream SQLite tests and verified we accepted or rejected the SQL in the same way as SQLite.
2) I wrote a test runner which checked for "idempotence" i.e. run the formatter over all the SQL from all the other types of tests then verify that the AST was identical in the input and output.
3) I also wrote a script which ran the formatter over the PerfettoSQL standard library [1], a real world SQLite-based codebase that I knew and deeply understood so I could go through each file and manually check the output.
> Examples of that would also be nice (I don't doubt the personal feel that waste was justified)
Some things learned concretely:
1) C was not going to work for the higher level parts of the project, even the formatter was not pleasant to read or write in C, the validator was much worse
2) Doing the SQLite source extraction in the same language meant that I could ship a really cool feature where the syntaqlite CLI could "generate dialect extensions" without people needing to download a separate script, run their own extraction on the SQLite source code, or worse yet, need to fork syntaqlite. This actually makes it technically possible for people in the web playground to dynamically build extensions to SQLite (though I haven't ended up plumbing that feature through yet)
3) Having a DSL [2] for extensions of SQLite (that e.g. PerfettoSQL could use) was the correct way to go rather than using YAML/JSON/XML etc becaue of how much clarity it provided and how AI took a lot of the annoyance of maintaining a DSL away.
4) I need to invest much more in testing from the start and also more testing where the correctness can be "proved" in some way (e.g. idempotence testing or SQLite upstream testing as described above)
[1] https://github.com/google/perfetto/tree/main/src/trace_proce... [2] https://docs.syntaqlite.com/v0.2.15/guides/custom-dialects/
BorgCfg had exactly the same situation.
mpvl (borgcfg original author, author of https://cuelang.org/) and others had tried to refine bcl while bcl itself is underspecified.
Eventually, the team built a drop-in replacement of bcl and specced out the language almost entirely.
The biggest lesson to me was that engineering never has any short cut.
Nowhere is this more obvious in my current projects than with CRUD interface building. It will go nuts building these elaborate labyrinths and I’m sitting there baffled, bemused, foolishly hoping that THIS time it would recognise that a single SQL query is all that’s needed. It knows how to write complex SQL if you insist, but it never wants to.
But even with those frustrations, damn it is a lot faster than writing it all myself.
Most of my questions are "in one sentence respond: long rambling context and question"
Then three weeks later you're tracing some control flow that makes no sense and nobody knows why it's structured that way. Not you, not the model. I've been treating it like code from a contractor now, review every line same as a junior dev's PR. Gets tedious but the alternative is worse.
I like this a lot. It suggests that AI use may sometimes incentivize people to get better at metacognition rather than worse. (It won't in cases where the output is good enough and you don't care.)
This precisely captures my experience with AI tools. When I understand the domain very deeply, AI feels like magic. I can tell it exactly how I want something implemented and it just appears in 30 seconds. When I don't understand something very well, however, I get easily misled by bogus design choices that I've delegated to the AI. It's so easy for me to spend 4 hours drafting some prototype in an almost dreamlike state of productive bliss, only for it to crash apart when I discover some fundamental bug in the thing I've vibecoded.
Ideally: local; offline.
Or do I have to wrestle it for 250 hours before it coughs up the dough? Last time I tried, the AI systems struggled with some of the most basic C code.
It seemed fine with Python, but then my cat can do that.
> Unfortunately, unlike many other languages,
what
> SQLite has no formal specification describing how it should be parsed.
sqlitebrowser.org is cool but it's not the sort of developer tools I'm talking about. As I clarify in the side notes, I'm looking for a formatter, linter, LSP, not an IDE.
> https://sqlite.org/syntax.html
As I replied to some other comment, I'm very aware that there is a syntax diagram but that really only tells half the story. If you actually look at those diagrams into detail, or you look into the the actual parse.y grammar (https://sqlite.org/src/file?name=src/parse.y&ci=trunk), you'll find that they're missing a lot of information which is required for you to actually interpret the SQL into an AST.
When I say "formal specification", I'm not just talking about the formal grammar rules but also how those interpreted in practice. Something closer to the ECMAScript specification (https://ecma-international.org/publications-and-standards/st...).
Unfortunately, AI seems to be divisive. I hope we will find our way back eventually. I believe the lessons from this era will reverberate for a long time and all sides stand to learn something.
As for me, I can’t help but notice there is a distinct group of developers that does not get it. I know because they are my colleagues. They are good people and not unintelligent, but they are set in their ways. I can imagine management forcing them to use AI, which at the moment is not the case, because they are such laggards. Even I sometimes want to “confront” them about their entire day wasted on something even the free ChatGPT would have handled adequately in a minute or two. It’s sad to see actually.
We are not doing important things and we ourselves are not geniuses. We know that or at least I know that. I worry for the “regular” developer, the one that is of average intellect like me. Lacking some kind of (social) moat I fear many of us will not be able to ride this one out into retirement.
I am a technologist. But I am seriously concerned about the ecological consequences of the training and usage of AI. To me, the true laggards are those, who have not understood yet, that climate change requires a prudent use of our resources.
I don't mind people having fun or being productive with AI. But I do mind it when AI is presented as the only way of doing things.
The counter here would be, what if AI could be made efficient? Suddenly OK then? Is it truly about the resources?
Walking to the nearest farm with my horse is much, much more sustainable than maintaining a sprawling toxic civilizational level infrastructure so I can go into my car to the supermarket. I get your point, but nearly every aspect of our world is filled to the brim with mind boggling complexity and corresponding resource usage and we get used to it.
Only an AI would bother to create a throwaway account to post such a shallow comment that is mostly fearmongering to push people to use AI.
"AI is an incredible force multiplier for implementation, but it’s a dangerous substitute for design."
Seconded!
Expanding a thought beyond 280 characters and publishing it somewhere other than the X outrage machine is something we should be encouraging.