It's a shame that AI coding tools have become such a polarizing issue among developers. I understand the reasons, but I wish there had been a smoother path to this future. The early LLMs like GPT-3 could sort of code enough for it to look like there was a lot of potential, and so there was a lot of hype to drum up investment and a lot of promises made that weren't really viable with the tech as it was then. This created a large number of AI skeptics (of whom I was one, for a while) and a whole bunch of cynicism and suspicion and resistance amongst a large swathe of developers. But could it have been different? It seems a lot of transformative new tech is fated to evolve this way. Early aircraft were extremely unreliable and dangerous and not yet worthy of the promises being made about them, but eventually with enough evolution and lessons learned we got the Douglas DC-3, and then in the end the 747.
If you're a developer who still doesn't believe that AI tools are useful, I would recommend you go read Mitchell's post, and give Claude Code a trial run like he did. Try and forget about the annoying hype and the vibe-coding influencers and the noise and just treat it like any new tool you might put through its paces. There are many important conversations about AI to be had, it has plenty of downsides, but a proper discussion begins with close engagement with the tools.
Our tooling just had a refresh in less than 3 years and it leaves heads spinning. People are confused, fighting for or against it. Torn even between 2025 to 2026. I know I was.
People need a way to describe it from 'agentic coding' to 'vibe coding' to 'modern AI assisted stack'.
We don't call architects 'vibe architects' even though they copy-paste 4/5th of your next house and use a library of things in their work!
We don't call builders 'vibe builders' for using earth-moving machines instead of a shovel...
When was the last time you reviewed the machine code produced by a compiler? ...
The real issue this industry is facing, is the phenomenal speed of change. But what are we really doing? That's right, programming.
Compilers will produce working output given working input literally 100% of my time in my career. I've never personally found a compiler bug.
Meanwhile AI can't be trusted to give me a recipe for potato soup. That is to say, I would under no circumstances blindly follow the output of an LLM I asked to make soup. While I have, every day of my life, gladly sent all of the compiler output to the CPU without ever checking it.
The compiler metaphor is simply incorrect and people trying to say LLMs compile English into code insult compiler devs and English speakers alike.
> We don't call builders 'vibe builders' for using earth-moving machines instead of a shovel...
> When was the last time you reviewed the machine code produced by a compiler?
Sure, because those are categorically different. You are describing shortcuts of two classes: boilerplate (library of things) and (deterministic/intentional) automation. Vibe coding doesn't use either of those things. The LLM agents involved might use them, but the vibe coder doesn't.
Vibe coding is delegation, which is a completely different class of shortcut or "tool" use. If an architect delegates all their work to interns, directs outcomes based on whims not principals, and doesn't actually know what the interns are delivering, yeah, I think it would be fair to call them a vibe architect.
We didn't have that term before, so we usually just call those people "arrogant pricks" or "terrible bosses". I'm not super familiar but I feel like Steve Jobs was pretty famously that way - thus if he was an engineer, he was a vibe engineer. But don't let this last point detract from the message, which is that you're describing things which are not really even similar to vibe coding.
Architect's copy-pasting is equivalent to a software developer reusing a tried and tested code library. Generating or writing new code is fundamentally different and not at all comparable.
> We don't call builders 'vibe builders' for using earth-moving machines instead of a shovel...
We would call them "vibe builders" if their machines threw bricks around randomly and the builders focused all of their time on engineering complex scaffolding around the machines to get the bricks flying roughly in the right direction.
But we don't because their machines, like our compilers and linters, do one job and they do it predictably. Most trades spend obscene amounts of money on tools that produce repeatable results.
> That's a lot of years! They're still called architects.
Because they still architect, they don't subcontract their core duties to architecture students overseas and just sign their name under it.
I find it fitting and amusing that people who are uncritical towards the quality of LLM-generated work seem to make the same sorts of reasoning errors that LLMs do. Something about blind spots?
"Architect" is actually a whole career progression of people with different responsibilities. The bottom rung used to be the draftsmen, people usually without formal education who did the actual drawing. Then you had the juniors, mid-levels, seniors, principals, and partners who each oversaw different aspects. The architects with their name on the building were already issuing high level guidance before the transition instead of doing their own drawings.
When was the last time you reviewed the machine code produced by a compiler?
Last week, to sanity check some code written by an LLM.It’s just not analogous to architecture, or cooking, or engineering. Software development is just its own thing. So you can’t use analogy to get yourself anywhere with a hint of rigour.
The problem is, AI is generating code that may be buggy, insecure, and unmaintainable. We have as a community spent decades trying to avoid producing that kind of code. And now we are being told that productivity gains mean we should abandon those goals and accept poor quality, as evidenced by MoltBook’s security problems.
It’s a weird cognitive dissonance and it’s still not clear how this gets resolved.
The ai tooling reverses this where the thinking is outsourced to the machine and the user is borderline nothing more than a spectator, an observer and a rubber stamp on top.
Anyone who is in this position seriously need to think their value added. How do they plan to justify their position and salary to the capital class. If the machine is doing the work for you, why would anyone pay you as much as they do when they can just replace you with someone cheaper, ideally with no-one for maximum profit.
Everyone is now in a competition not only against each other but also against the machine. And any specialized. Expert knowledge moat that you've built over decades of hard work is about to evaporate.
This is the real pressing issue.
And the only way you can justify your value added, your position, your salary is to be able to undermine the AI, find flaws in it's output and reasoning. After all if/when it becomes flawless you have no purpose to the capital class!
Any time I’m doing serious optimization or knee-deep in debugging something where the bug emerged at -O2 but not at -O0.
Sometimes just for fun to see what the compiler is doing in its optimization passes.
You severely limit what you can do and what you can learn if you never peek underneath.
It therefore depends on where we place the discovery/availability of the product. If we place it at the time of prototype production (in the early 1960s for CAD), it took a generation (20-30 years), since by the early and mid-1990s, all professionals were already using CAD.
But if we place it at the time when CAD and personal computers became available to the general public (e.g., mid-1980s), it took no more than 5-10 years. I attended a technical school in the 1990s, and we started with hand drawing in the first two years and used CAD systems in the remaining three years of school.
The same can be said for AI. If we place the beginning of AI in the mid-1980s, the wider adoption of AI took more than a generation. If we place it at the time OpenAI developed GPT, it took 5-10 years.
Maybe not, but we don't allow non-architects to vomit out thousands of diagrams that they cannot review, and that is never reviewed, which are subsequently used in the construction of the house.
Your analogy to s/ware is fatally and irredeemably flawed, because you are comparing the regulated and certification-heavy production of content, which is subsequently double-checked by certified professionals, with an unregulated and non-certified production of content which is never checked by any human.
I use Godbolt regularly, as do many C and C++ devs, for the sake of optimization, but I'm not sure that's what you mean. Checking the generated output is sometimes still needed for correctness, probably more so in the past than now due to threads not being a part of the language and the memory model being under-specified (allowing compiler authors to assume a single-threaded execution model), but those were deficiencies in those languages in particular (since remedied). Paradoxically, it's also still sometimes needed to ensure the compiler doesn't optimize certain code away, for the sake of security.
> We don't call builders 'vibe builders' for (…)
> When was the last time (…)
None of those are the same thing. At all. They are still all deterministic approaches. The architect’s library of things doesn’t change every time they use it or present different things depending on how they hold it. It’s useful because it’s predictable. Same for all your other examples.
If we want to have an honest discussion about the pros and cons of LLM-generated code, proponents need to stop being dishonest in their comparisons. They also need to stop plugging their ears and not ignore the other issues around the technology. It is possible to have something which is useful but whose advantages do not outweigh the disadvantages.
Maybe, but they do it through the filter of their knowledge, experience and wisdom, not by rolling a large number of dice to execute a design.
LLMs are useful, just less useful than people think, for instance, 'technical debt' production has now become automated at an industrial scale.
i once found a bug in https://developers.google.com/closure/compiler
by borrowing a function from a not invoked Array broke the compiled code
spent a weekend in reading minified code
good times
imagine if machine operator was saying "move this earth here, not not this one, 30 cantimeters to the left... yes, move this one earth here, no not here, please a bit to the right", lol
of course, someone with no exprerience with the machine would prefer telling it what to do, and he will even achieve something rough, but someone who knows what handles and pedals do would prefer the handles and pedals
- Pull requests
- Merge requests
- Code review
I feel like I’m taking crazy pills. Are SWE supposed to move away from code review, one of the core activities for the profession? Code review is as fundamental for SWE as double entry is for accounting.Yes, we know that functional code can get generated at incredible speeds. Yes, we know that apps and what not can be bootstrapped from nothing by “agentic coding”.
We need to read this code, right? How can I deliver code to my company without security and reliability guarantees that, at their core, come from me knowing what I’m delivering line-by-line?
Some times you’re gonna want to slowly back away from them and write things yourself. Sometimes you can farm out work to them.
Code review their work as you would any one else’s, in fact more so.
My rule of thumb has been it takes a senior engineer per every 4 new grads to mentor them and code review their work. Or put another way bringing on a new grad gets you +1 output at the cost of -0.25 a senior.
Also, there are some tasks you just can’t give new college grads.
Same dynamic seems to be shaping up here. Except the AI juniors are cheap and work 24*7 and (currently) have no hope of growing into seniors.
He doesn't go in details, but there is a bit:
> Issue and PR triage/review. Agents are good at using gh (GitHub CLI), so I manually scripted a quick way to spin up a bunch in parallel to triage issues. I would NOT allow agents to respond, I just wanted reports the next day to try to guide me towards high value or low effort tasks.
> More specifically, I would start each day by taking the results of my prior night's triage agents, filter them manually to find the issues that an agent will almost certainly solve well, and then keep them going in the background (one at a time, not in parallel).
This is a short excerpt, this article is worth reading. Very grounded and balanced.
Mitchell talks about this in a round about way... in the "Reproduce your own work" section he obviously reviewed that code as that was the point. In the "End-of-day agents" section he talks about what he found them good for (so far). He previously wrote about how he preferred an interactive style and this article aligns with that with his progress understanding how code agents can be useful.
That being said, software quality seems to be decreasing, or maybe it's just cause I use a lot of software in a somewhat locked down state with adblockers and the rest.
Although, that wouldn't explain just how badly they've murdered the once lovely iTunes (now Apple Music) user interface. (And why does CMD-C not pick up anything 15% of the time I use it lately...)
Anyways, digressions aside... the complexity in software development is generally in the organizational side. You have actual users, and then you have people who talk to those users and try to see what they like and don't like in order to distill that into product requirements which then have to be architected, and coordinated (both huge time sinks) across several teams.
Even if you cut out 100% of the development time, you'd still be left with 80% of the timeline.
Over time though... you'll probably see people doing what I do all day (which is move around among many repositories (although I've yet to use the AI much, got my Cursor license recently and am gonna spin up some POCs that I want to see soon)), enabled by their use of AI to quickly grasp what's happening in the repo, and the appropriate places to make changes.
Enabling developers to complete features from tip to tail across deep, many pronged service architectures would could bring project time down drastically and bring project management, and cross team coordination costs down tremendously.
Similarly, in big companies, the hand is often barely aware at best of the foot. And space exploration is a serious challenge. Often folk know exactly one step away, and rely on well established async communication channels which also only know one step further. Principal engineers seem to know large amounts about finite spaces and are often in the dark small hops away to things like the internal tooling for the systems they're maintaining (and often not particularly great at coming in to new spaces and thinking with the same perspective... no we don't need individual micro services for every 12 request a month admin api group we want to set up).
Once systems can take a feature proposal and lay out concrete plans which each little kingdom can give a thumbs up or thumbs down to for further modifications, you can again reduce exploration, coordination, and architecture time down.
Sadly, seems like User Experience design is an often terribly neglected part of our profession. I love the memes about an engineer building the perfect interface like a water pitcher only for the person to position it weirdly in order to get a pour out of the fill hole or something. Lemme guess how many users you actually talked to (often zero), and how many layers of distillation occurred before you received a micro picture feature request that ends up being build and taking input from engineers with no macro understanding of a user's actual needs, or day to day.
And who often are much more interested in perfecting some little algorithm thank thinking about enabling others.
So my money is on money flowing to... - People who can actually verify system integrity, and can fight fires and bugs (but a lot of bug fixing will eventually becoming prompting?) - Multi-talented individuals who can say... interact with users well enough to understand their needs as well as do a decent job verifying system architecture and security
It's outside of coding where I haven't seen much... I guess people use it to more quickly scaffold up expense reports, or generate mocks. So, lots of white collar stuff. But... it's not like the experience of shopping at the supermarket has changed, or going to the movies, or much of anything else.
If one asks this generator/assistant same request/thing, within same initial contexts, 10 times, would it generate same result ? in different sessions and all that.
because.. if not, then it's for once-off things only..
And why does it matter anyway? I'd the code passes the tests and you like the look of it, it's good. It doesn't need to be existentially complicated.
Claude Code is linked to Anthropic's hosted models so you can't achieve this.
It is a shame it's become such a polarized topic. Things which actually work fine get immediately bashed by large crowds at the same time things that are really not there get voted to the moon by extremely eager folks. A few years from now I expect I'll be thinking "man, there was some really good stuff I missed out on because the discussions about it were so polarized at the time. I'm glad that has cleared up significantly!"
The workflow automation and better (and model-directed) context management are all obvious in retrospect but a lot of people (like myself) were instead focused on IDE integration and such vs `grep` and the like. Maybe multi-agent with task boards is the next thing, but it feels like that might also start to outrun the ability to sensibly design and test new features for non-greenfield/non-port projects. Who knows yet.
I think it's still very valuable for someone to dig in to the underlying models periodically (insomuch as the APIs even expose the same level of raw stuff anymore) to get a feeling for what's reliable to one-shot vs what's easily correctable by a "ran the tests, saw it was wrong, fixed it" loop. If you don't have a good sense of that, it's easy to get overambitious and end up with something you don't like if you're the sort of person who cares at all about what the code looks like.
Whether or not LLM coding AIs are useful, they certainly qualifies as "important," because adopting one is disruptive enough that getting rid of it after adopting it would be disruptive as well. I'm not signing on to that if I need to pay a recurring fee for it and/or need to rely on some company deciding to continue maintaining a cloud server running it in perpetuity.
I'm embracing LLMs but I think I've had to just pick a happy medium and stick with Claude Code with MCPs until somebody figures out a legitimate way to use the Claude subscription with open source tools like OpenCode, then I'll move over to that. Or if a company provides a model that's as good value that can be used with OpenCode.
That's kind of where AI-agent-coding is now too, though... software is more flexible.
That hasn’t been tech for a long time.
Frontend has been changing forever. React and friends have new releases all the time. Node has new package managers and even Deno and Bun. AWS keeps changing things.
OpenAI's Codex?
They do not.
> And clearly super human in some areas.
Sure, if you think calculators or bicycles are "superhuman technology".
Lay off the hype pills.
But that speed makes a pretty significant difference in experience.
If you wait a couple minutes and then give the model a bunch of feedback about what you want done differently, and then have to wait again, it gets annoying fast.
If the feedback loop is much tighter things feel much more engaging. Cursor is also good at this (investigate and plan using slower/pricier models, implement using fast+cheap ones).
I don't know about you but this has been the case for my entire career. Mgmt never gave a shit about beautiful code or tech debt or maintainability or how enlightened I felt writing code.
OPINION | THE REALITY CHECK In the gleaming offices of Silicon Valley and the boardrooms of the Fortune 500, a new religion has taken hold. Its deity is the Large Language Model, and its disciples—the AI Evangelists—speak in a dialect of "disruption," "optimization," and "seamless integration." But outside the vacuum of the digital world, a dangerous friction is building between AI’s statistical hallucinations and the unyielding laws of physics.
The danger of Artificial Intelligence isn't that it will become our overlord; the danger is that it is fundamentally, confidently, and authoritatively stupid.
The Paradox of the Wind-Powered Car The divide between AI hype and reality is best illustrated by a recent technical "solution" suggested by a popular AI model: an electric vehicle equipped with wind generators on the front to recharge the battery while driving. To the AI, this was a brilliant synergy. It even claimed the added weight and wind resistance amounted to "zero."
To any human who has ever held a wrench or understood the First Law of Thermodynamics, this is a joke—a perpetual motion fallacy that ignores the reality of drag and energy loss. But to the AI, it was just a series of words that sounded "correct" based on patterns. The machine doesn't know what wind is; it only knows how to predict the next syllable.
The Erosion of the "Human Spark" The true threat lies in what we are sacrificing to adopt this "shortcut" culture. There is a specific human process—call it The Stare. It is that thirty-minute window where a person looks at a broken machine, a flawed blueprint, or a complex problem and simply observes.
In that half-hour, the human brain runs millions of mental simulations. It feels the tension of the metal, the heat of the circuit, and the logic of the physical universe. It is a "Black Box" of consciousness that develops solutions from absolutely nothing—no forums, no books, and no Google.
However, the new generation of AI-dependent thinkers views this "Stare" as an inefficiency. By outsourcing our thinking to models that cannot feel the consequences of being wrong, we are witnessing a form of evolutionary regression. We are trading hard-earned competence for a "Yes-Man" in a box.
The Gaslighting of the Realist Perhaps most chilling is the social cost. Those who still rely on their intuition and physical experience are increasingly being marginalized. In a world where the screen is king, the person pointing out that "the Emperor has no clothes" is labeled as erratic, uneducated, or naive.
When a master craftsman or a practical thinker challenges an AI’s "hallucination," they aren't met with logic; they are met with a robotic refusal to acknowledge reality. The "AI Evangelists" have begun to walk, talk, and act like the models they worship—confidently wrong, devoid of nuance, and completely detached from the ground beneath their feet.
The High Cost of Being "Authoritatively Wrong" We are building a world on a foundation of digital sand. If we continue to trust AI to design our structures and manage our logic, we will eventually hit a wall that no "prompt" can fix.
The human brain runs on 20 watts and can solve a problem by looking at it. The AI runs on megawatts and can’t understand why a wind-powered car won't run forever. If we lose the ability to tell the difference, we aren't just losing our jobs—we're losing our grip on reality itself.
Frankly I'm so tired of the usual "I don't find myself more productive", "It writes soup". Especially when some of the best software developers (and engineers) find many utility in those tools, there should be some doubt growing in that crowd.
I have come to the conclusion that software developers, those only focusing on the craft of writing code are the naysayers.
Software engineers immediately recognize the many automation/exploration/etc boosts, recognize the tools limits and work on improving them.
Hell, AI is an insane boost to productivity, even if you don't have it write a single line of code ever.
But people that focus on the craft (the kind of crowd that doesn't even process the concept of throwaway code or budgets or money) will keep laying in their "I don't see the benefits because X" forever, nonsensically confusing any tool use with vibe coding.
I'm also convinced that since this crowd never had any notion of what engineering is (there is very little of it in our industry sadly, technology and code is the focus and rarely the business, budget and problems to solve) and confused it with architectural, technological or best practices they are genuinely insecure about their jobs because once their very valued craft and skills are diminished they pay the price of never having invested in understanding the business, the domain, processes or soft skills.
As the most senior IC within my org, since the advent of (enforced) LLM adoption my code contribution/output has stalled as my focus has shifted to the reactionary work of sifting through the AI generated chaff following post mortems of projects that should have never have shipped in the first place. On a good day I end up rejecting several PRs that most certainly would have taken down our critical systems in production due to poor vetting and architectural flaws, and on the worst I'm in full on fire fighting mode to "fix" the same issues already taking down production (already too late.)
These are not inherent technical problems in LLMs, these are organizational/processes problems induced by AI pushers promising 10x output without the necessary 10x requirements gathering and validation efforts that come with that. "Everyone with GenAI access is now a 10x SDE" is the expectation, when the reality is much more nuanced.
The result I see today is massive incoming changesets that no one can properly vet given the new shortened delivery timelines and reduced human resourcing given to projects. We get test suite coverage inflation where "all tests pass" but undermine core businesses requirements and no one is being given the time or resources to properly confirm the business requirements are actually being met. Shit hits the fan, repeat ad nauseum. The focus within our industry needs to shift to education on the proper application and use of these tools, or we'll inevitably crash into the next AI winter; an increasingly likely future that would have been totally avoidable if everyone drinking the Koolaid stopped to observe what is actually happening.
As you implied, code is cheap and most code is "throwaway" given even modest time horizons, but all new code comes with hidden costs not readily apparent to all the stakeholders attempting to create a new normal with GenAI. As you correctly point out, the biggest problems within our industry aren't strictly technical ones, they're interpersonal, communication and domain expertise problems, and AI use is simply exacerbating those issues. Maybe all the orgs "doing it wrong" (of which there are MANY) simply fail and the ones with actual engineering discipline "make it," but it'll be a reckoning we should not wish for.
I have heard from a number of different industry players and they see the same patterns. Just look at the average linked in post about AI adoption to confirm. Maybe you observe different patterns and the issues aren't as systemic as I fear. I honestly hope so.
Your implication that seniors like myself are "insecure about our jobs" is somewhat ironically correct, but not for the reasons you think.
This is the key one I think. At one extreme you can tell an agent "write a for loop that iterates over the variable `numbers` and computes the sum" and they'll do this successfully, but the scope is so small there's not much point in using an LLM. On the other extreme you can tell an agent "make me an app that's Facebook for dogs" and it'll make so many assumptions about the architecture, code and product that there's no chance it produces anything useful beyond a cool prototype to show mom and dad.
A lot of successful LLM adoption for code is finding this sweet spot. Overly specific instructions don't make you feel productive, and overly broad instructions you end up redoing too much of the work.
It cognitively feels very similar to other classic programming activities, like modularization at any level from architecture to code units/functions, thoughtfully choosing how to lay out and chunk things. It's always been one of the things that make programming pleasurable for me, and some of that feeling returns when slicing up tasks for agents.
I'm starting to think of projects now as a tree structure where the overall architecture of the system is the main trunk and from there you have the sub-modules, and eventually you get to implementations of functions and classes. The goal of the human in working with the coding agent is to have full editorial control of the main trunk and main sub-modules and delegate as much of the smaller branches as possible.
Sometimes you're still working out the higher-level architecture, too, and you can use the agent to prototype the smaller bits and pieces which will inform the decisions you make about how the higher-level stuff should operate.
Or maybe something quite different but where these early era agentic tooling strategies still become either unneeded or even actively detrimental.
What this misses, of course, is that you can just have the agent do this too. Agent's are great at making project plans, especially if you give them a template to follow.
The more detailed I am in breaking down chunks, the easier it is for me to verify and the more likely I am going to get output that isn't 30% wrong.
Amusingly, this was my experience in giving Lovable a shot. The onboarding process was literally just setting me up for failure by asking me to describe the detailed app I was attempting to build.
Taking it piece by piece in Claude Code has been significantly more successful.
But not so good at making (robust) new features out of the blue
Maybe there’s something about not having to context switch between natural language and code just makes it _feel_ easier sometimes
Actually that's how I did most of my work last year. I was annoyed by existing tools so I made one that can be used interactively.
It has full context (I usually work on small codebases), and can make an arbitrary number of edits to an arbitrary number of files in a single LLM round trip.
For such "mechanical" changes, you can use the cheapest/fastest model available. This allows you to work interactively and stay in flow.
(In contrast to my previous obsession with the biggest, slowest, most expensive models! You actually want the dumbest one that can do the job.)
I call it "power coding", akin to power armor, or perhaps "coding at the speed of thought". I found that staying actively involved in this way (letting LLM only handle the function level) helped keep my mental model synchronized, whereas if I let it work independently, I'd have to spend more time catching up on what it had done.
I do use both approaches though, just depends on the project, task or mood!
The failure mode I kept hitting wasn’t just "it makes mistakes", it was drift: it can stay locally plausible while slowly walking away from the real constraints of the repo. The output still sounds confident, so you don’t notice until you run into reality (tests, runtime behaviour, perf, ops, UX).
What ended up working for me was treating chat as where I shape the plan (tradeoffs, invariants, failure modes) and treating the agent as something that does narrow, reviewable diffs against that plan. The human job stays very boring: run it, verify it, and decide what’s actually acceptable. That separation is what made it click for me.
Once I got that loop stable, it stopped being a toy and started being a lever. I’ve shipped real features this way across a few projects (a git like tool for heavy media projects, a ticketing/payment flow with real users, a local-first genealogy tool, and a small CMS/publishing pipeline). The common thread is the same: small diffs, fast verification, and continuously tightening the harness so the agent can’t drift unnoticed.
Yeah I would get patterns where, initial prototypes were promising, then we developed something that was 90% close to design goals, and then as we try to push in the last 10%, drift would start breaking down, or even just forgetting, the 90%.
So I would start getting to 90% and basically starting a new project with that as the baseline to add to.
These patterns seem to be picking up speed in the general population; makes the human race seem quite easily hackable.
If the human race were not hackable then society would not exist, we'd be the unchanging crocodiles of the last few hundred million years.
Have you ever found yourself speaking a meme? Had a catchy toon repeating in your head? Started spouting nation state level propaganda? Found yourself in crowd trying to burn a witch at the stake?
Hacking the flow of human thought isn't that hard, especially across populations. Hacking any one particular humans thoughts is harder unless you have a lot of information on them.
With where the models are right now you still need a human in the loop to make sure you end up with code you (and your organisation) actually understands. The bottle neck has gone from writing code to reading code.
these are some ticks I use now.
1. Write a generic prompts about the project and software versions and keep it in the folder. (I think this getting pushed as SKIILS.md now)
2. In the prompt add instructions to add comments on changes, since our main job is to validate and fix any issues, it makes it easier.
3. Find the best model for the specific workflow. For example, these days I find that Gemini Pro is good for HTML UI stuff, while Claude Sonnet is good for python code. (This is why subagents are getting popluar)
But why is the cost never discussed or disclosed in these conversations? I feel like I'm going crazy, there is so much written extolling the virtues of these tools but with no mention of what it costs to run them now. It will surely only get more expensive from here!
And not just the monetary cost of accessing the tools, but the amount of time it takes to actually get good results out. I strongly suspect that even though it feels more productive, in many cases things just take longer than they would if done manually.
I think there are really good uses for LLMs, but I also think that people are likely using them in ways that feel useful, but end up being more costly than not.
There are two usage quota windows to be aware of: 5h and 7d. I use https://github.com/richhickson/claudecodeusage (Mac) to keep track of the status. It shows green/yellow/red and a percentage in the menu bar.
Apparently out of 3-5k people with access to our AI tools, there's fewer than a handful of us REALLY using it. Most are asking questions in the chatbot style.
Anyway, I had to ask my manager, the AI architect, and the Tooling Manager for approval to increase my quota.
I asked everyone in the chain how much equivalent dollars I am allocated, and how much the increase was and no one could tell me.
There is nothing quantifiable here: https://claude.com/pricing
Pro: "Everything in Free, plus: More usage"
Max: "Choose 5x or 20x more usage than Pro"
Wow, 5x or 20x more of "more". That's some masterful communication right there.
Is it? Sure, the chatbot style maxes at $200/month. I consider that ... not unreasonable ... for a professional tool. It doesn't make me happy, but it's not horrific.
The article, however, explicitly pans the chatbot style and is extolling the API style being accessed constantly by agents, and that has no upper bound. Roughly $10-ish per Megatokens. $10-ish per 1K web searches. etc.
This doesn't sound "minimal" to me. This sounds like every single "task" I kick off is $10. And it can kick those tasks and costs off very quickly in an automated fashion. It doesn't take many of those tasks before I'm paying more than an actual full developer.
After that is basically just asking it to flesh out the layers starting from zero dependencies to arriving at the top of the castle. Even if we have any complexities within the pieces or the implementation is not exactly as per my liking, the issues are localised - I can dive in and handle it myself (most of the time, I don't need to).
I feel like this approach works very well for me having a mental model of how things are connected because the most of the time I spent was spent on that model.
I do run multiple models at once now. On different parts of the code base.
I focus solely on the less boring tasks for myself and outsource all of the slam dunk and then review. Often use another model to validate the previous models work while doing so myself.
I do git reset still quite often but I find more ways to not get to that point by knowing the tools better and better.
Autocompleting our brains! What a crazy time.
Is your company paying for it or you?
What is your process of the agent writes a piece of code, let's say a really complex recursive function, and you aren't confident you could have come up with the same solution? Do you still submit it?
I think we need to stop listening to billionaires. The article is well thought out and well written, but his perspective is entirely biased by never having to think about money at all... all of this stuff is incredibly expensive.
And the post touches on a next type of a problem, how to plan far ahead of time to utilise agents when you are away. It is a difficult problem but IMO we’re going in a direction of having some sort of shared “templated plans”/workflows and budgeted/throttled task execution to achieve that. It is like you want to give a little world to explore so that it does not stop early, like a little game to play, then you come back in the morning and check how far it went.
To me part of our job has always been about translating garbage/missing specs in something actionnable.
Working with agents don't change this and that's why until PM/business people are able to come up with actual specs, they'll still need their translators.
Furthermore, it's not because the global spec is garbage that you, as a dev, won't come up with clear specs to solve technical issues related to the overall feature asked by stakeholders.
One funny thing I see though, is in the AI presentations done to non-technical people, the advice: "be as thorough as possible when describing what you except the agent to solve!". And I'm like: "yeah, that's what devs have been asking for since forever...".
Just because you haven't or you work in a particular way, doesn't mean everyone does things the same way.
Likewise, on your last point, just because someone is using AI in their work, doesn't mean they don't have hard skills and know-how. Author of this article Mitchell is a great example of that - someone who proved to be able to produce great software and, when talking about individuals who made a dent in the industry, definitely had/has an impactful career.
I just recently added in Codex, since it comes with my $20/mo subscription to GPT and that's lowering my Claude credit usage significantly... until I hit those limits at some point.
2012 + 300 + 5~200... so about $1500-$1600/year.
It is 100% worth it for what I'm building right now, but my fear is that I'll take a break from coding and then I'm paying for something I'm not using with the subscriptions.
I'd prefer to move to a model where I'm paying for compute time as I use it, instead of worrying about tokens/credits.
I'm a huge believer in AI agent use and even I think this is wrong. It's like saying "always have something compiling" or "make sure your Internet is always downloading something".
The most important work happens when an agent is not running, and if you spend most of your time looking for ways to run more agents you're going to streetlight-effect your way into solving the wrong problems https://en.wikipedia.org/wiki/Streetlight_effect
I just don't need the AI writing code for me, don't see the point. Once I know from the ai chat research what my solution is I can code it myself with the benefit I then understand more what I am doing.
And yes I've tried the latest models! Tried agent mode in copilot! Don't need it!
That's one very short step removed from Simon Willison's lethal trifecta.
Versus other threads (here on HN, and especially on places like LinkedIn) where it's "I set up a pipeline and some agents and now I type two sentences and amazing technology comes out in 5 minutes that would have taken 3 devs 6 months to do".
For more on the "harness engineering", see what Armin Ronacher and Mario Zechner are doing with pi: https://lucumr.pocoo.org/2026/1/31/pi/ https://mariozechner.at/posts/2025-11-30-pi-coding-agent/
> I really don't care one way or the other if AI is here to stay3, I'm a software craftsman that just wants to build stuff for the love of the game.
I suspect having three comma on one's bank account helps being very relaxed about the outcome ;)
I wonder how much all this costs on a monthly basis?
Or if we assume that the OP can only do 4 hours per sitting (mentioned in the other post) and 8 hours of overnight agents then it would come down to $15.98 * 1.5 * 20 = $497,40 a month (without weekends).
Are people seriously dropping hundreds of dollars a month on these products to get their work done?
This is what's so annoying about it. It's like a child that does the same errors again and again.
But couldn't it adjust itself with the goal of reducing the error bit by bit? Wouldn't this lead to the ultimate agent who can read your mind? That would be awesome.
Your improvement is someone else's code smell. There's no absolute right or wrong way to write code, and that's coming from someone who definitely thinks there's a right way. But it's my right way.
Anyway, I don't know why you'd expect it to write code the way you like after it's been trained on the whole of the Internet & the the RLHF labelers' preferences and the reward model.
Putting some words in AGENTS.md hardly seems like the most annoying thing.
tip: Add a /fix command that tells it to fix $1 and then update AGENTS.md with the text that'd stop it from making that mistake in the future. Use your nearest LLM to tweak that prompt. It's a good timesaver.
A mind reading ultimate agent sounds more like a deity, and there are more than enough fables warning one not to create gods because things tend to go bad. Pumping out ASI too quickly will cause massive destabilization and horrific war. Not sure who against really either. Could be us humans against the ASI, could be the rich humans with ASI against us. Anyway about it, it would represent a massive change in the world order.
I also love using it for research for upcoming features. Research + pick a solution + implement. It happens so fast.
This perspective is why I think this article is so refreshing.
Craftsmen approach tools differently. They don't expect tools to work for them out-of-the-box. They customize the tool to their liking and reexamine their workflow in light of the tool. Either that or they have such idiosyncratic workflows they have to build their own tools.
They know their tools are custom to _them_. It would be silly to impose that everyone else use their tools-- they build different things!
I'd already abandoned it for generating code, for all the reasons everyone knows, that don't need to be rehashed.
I was still in the camp of "It's a better google" and can save me time with research.
The issue it, at this point in my career (30+ years) the questions I have are a bit more nuanced and complex. They aren't things like "how do I make a form with React".
I'm working on developing a very high performance peer server that will need to scale up to hundreds of thousands to a million concurrent web socket connections to work as a signaling server for WebRTC connection negotiation.
I wanted to start as simple as possible, so peerjs is attractive. I asked the AI if peerjs peer-server would work with NodeJS's cluster server. It enthusiastically told me it would work just fine and was, in fact, designed for that.
I took a look at the source code, and it looked to me like that was dead wrong. The AI kept arguing with me before finally admitting it was completely wrong. A total waste of time.
Same results asking it how to remove Sophos from a Mac.
Same with legal questions about HOA laws, it just totally hallucinates things thay don't exist.
My wife and I used to use it to try to settle disagreements (i.e a better google) but amusingly we've both reached a place where we distrust anything it says so much, we're back to sending each other web articles :-)
I'm still pretty excited about the potential use of AI in elementary education, maybe through high school in some cases, but for my personal use, I've been reaching for it less and less.
I used to joke that programming is not a career - it's a disease - since practiced long enough it fundamentally changes the way you think and talk, always thinking multiple steps ahead and the implications of what you, or anyone else, is saying. Asking advice from another seasoned developer you'll get advice that has also been "pre-analyzed", but not from an LLM.
What shifted for us was when we stopped experimenting for novelty and started embedding AI where routine work slowed people down. For example, we built an intake assistant for hospitals: guided questions that organize structured history before a doctor sees the patient. At first it felt promising, but adoption only happened when clinic staff saw that it saved them time and didn’t replace their judgment. That forced us to rethink how we framed the feature. It became about support, not replacement.
The real adoption turning point came when non-technical team members began using the tools without hesitation. That’s when it stopped being AI and just became part of workflow.
This is the honeymoon phase. You're learning the ins and outs of the specific model you're using and becoming more productive. It's magical. Nothing can stop you. Then you might not be improving as fast as you did at the start, but things are getting better every day. Or maybe every week. But it's heaps better than doing it by hand because you have so much mental capacity left.
Then a new release comes up. An arbitrary fraction of your hard earned intuition is not only useless but actively harmful to getting good results with the new models. Worse you will never know which part it is without unlearning everything you learned and starting over again.
I've had to learn the quirks of three generations of frontier families now. It's not worth the hassle. I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months. Copy and paste is the universal interface and being able to do surgery on the chat history is still better than whatever tooling is out there.
Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.
> I've gone back to managing the context window in Emacs because I can't be bothered to learn how to deal with another model family that will be thrown out in six months.
Can you expand more on what you mean by that? I'm a bit of a noob on llm enabled dev work. Do you mean that you will kick off new sessions and provide a context that you manage yourself instead of relying on a longer running session to keep relevant information?
> Unironically learning vim or Emacs and the standard Unix code tools is still the best thing you can do to level up your llm usage.
I appreciate your insight but I'm failing to understand how exactly knowing these tools increases performance of llms. Is it because you can more precisely direct them via prompts?
All the tooling is there to manage that context for you. It works, to a degree, then stops working. Your intuition is there to decide when it stops working. This intuition gets outdated with each new release of the frontier model and changes in the tooling.
The stateless API with a human deciding what to feed it is much more efficient in both cost and time as long as you're only running a single agent. I've yet to see anyone use multiple agents to generate code successfully (but I have used agent swarms for unstructured knowledge retrieval).
The Unix tools are there for you to progra-manually search and edit the code base copy/paste into the context that you will send. Outside of Emacs (and possibly vim) with the ability to have dozens of ephemeral buffers open to modify their output I don't imagine they will be very useful.
Or to quote the SICP lectures: The magic is that there is no magic.
While I also use LLMs in other ways, this is my core workflow. I quickly get frustrated when I can't _quickly_ modify the context.
If you have some mastery over your editor, you can just run commands and post relevant output and make suggested changes to get an agent like experience, at a speed not too different from having the agent call tools. But you retain 100% control over the context, and use a tiny fraction of the tokens OpenCode and other agents systems would use.
It's not the only or best way to use LLMs, but I find it incredibly powerful, and it certainly has it's place.
A very nice positive effect I noticed personally is that as opposed to using agents, I actually retain an understanding of the code automatically, I don't have to go in and review the work, I review and adjust on the fly.
The chat and session interfaces obscure this, making it look more stateful than it is. But they mainly just send the whole chat so far back to the LLM to get the next response. That's why the context window grows as a chat/session continues. It's also why the answers tend to get worse with longer context windows – you're giving the LLM a lot more to sift through.
You can manage the context window manually instead. You'll potentially lose some efficiencies from prompt caching, but you can also keep your requests much smaller and more relevant, likely spending fewer tokens.
OP is also a founder of Hashicorp, so.. lol.
> This is the honeymoon phase.
No offense but you come across as if you didn’t read the article.
I'll wait for OP to move their workflow to Claude 7.0 and see if they still feel as bullish on AI tools.
People who are learning a new AI tool for the first time don't realzie that they are just learning quirks of the tool and underlying and not skills that generalize. It's not until you've done it a few times that you realzie you've wasted more than 80% of your time on a model that is completely useless and will be sunset in 6 months.
Some arguments are made about retaining focus and single-mindedness while working on AI. I think these points are important. It’s related to the article on cutting out over-eager orchestration and focusing on validation work (https://sibylline.dev/articles/2026-01-27-stop-orchestrating...). There are a few sides to this covered in the article. You should always have high value task to switch to when the agent is working (instead of scrolling tiktok, instagram,X, youtube, facebook, hackernews .etc). In my case I might try start to read some books that I have on the backburner like Ghost in the Wires. You should disable agent notifications and take control of when you return to check the model context to be less ADHD ridden when programming with agents and actually make meaningful progress on the side task since you only context switch when you are satisfied. The final one is to always have at least one agent and preferably only one agent running in the background. The idea is that always having an agent results in a slow burn of productivity improvements and a process where you can slowly improve the background agent performance. Generally, always having some agent running is a good way to stay on top of what current model capabilities are.
I also really liked the idea of overnight agents for library research, redevelopment of projects to test out new skills, tests and AGENTS.md modifications.
This reminded me of back when wysiwyg web editors started becoming a thing, and coders started adding those "Created in notepad" stickers to their webpages, to point out they were 'real' web developers. Fun times.
That said, I've given it a go. I used zed, which I think is a pretty great tool. I bought a pro subscription and used the built in agent with Claude Sonnet 4.x and Opus. I'm a Rails developer in my day job, and, like MitchellH and many others, found out fairly quickly that tasks for the LLM need to be quite specific and discrete. The agent is great a renames and minor refactors, but my preferred use of the agent was to get it to write RSpec tests once I'd written something like a controller or service object.
And generally, the LLM agent does a pretty great job of this.
But here's the rub: I found that I was losing the ability to write rspec.
I went to do it manually and found myself trying to remember API calls and approaches required to write some specs. The feeling of skill leaving me was quite sobering and marked my abandonment of LLMs and Zed, and my return to neovim, agent-free.
The thing is, this is a common experience generally. If you don't use it, you lose it. It applies to all things: fitness, language (natural or otherwise), skills of all kinds. Why should it not apply to thinking itself.
Now you may write me and my experience off as that of a lesser mind, and that you won't have such a problem. You've been doing it so long that it's "hard-wired in" by now. Perhaps.
It's in our nature to take the path of least resistance, to seek ease and convenience at every turn. We've certainly given away our privacy and anonymity so that we can pay for things with our phones and send email for "free".
LLMs are the ultimate convenience. A peer or slave mind that we can use to do our thinking and our work for us. Some believe that the LLM represents a local maxima, that the approach can't get much better. I dunno, but as AI improves, we will hand over more and more thinking and work to it. To do otherwise would be to go against our very nature and every other choice we've made so far.
But it's not for me. I'm no MitchellH, and I'm probably better off performing the mundane activities of my work, as well as the creative ones, so as to preserve my hard-won knowledge and skills.
YMMV
I'll leave off with the quote that resonates the most with me as I contemplate AI:-
"I say your civilization, because as soon as we started thinking for you, it really became our civilization, which is, of course, what this is all about." -- Agent Smith "The Matrix"
- When tests didn't work I had to check what was going on and the LLMs do cheat a lot with Volkswagen tests, so that began to make me skeptic even of what is being written by the agents
- When things were broken, spaghetti and awful code tends to be written in an obnoxius way it's beyond repairable and made me wish I had done it from scratch.
Thankfully I just tried using agents for tests and not for the actual code, but it makes me think a lot if "vibe coding" really produces quality work.
And then I raise the PR and other humans review it, and they won't let me merge crap code.
Is it that a lot of you are working with much lighter weight processes and you're not as strict about what gets merged to main?
This gave me a physical flinch. Perhaps this is unfounded, but all this makes me think of is this becoming the norm, millions of people doing this, and us cooking our planet out much faster than predicted.
agents jump ahead to the point of the user and project being out of control and more expensive
I think a lot of us still hesitate to make that jump; or at least I am not sure of a cost-effective agent approach (I guess I could manually review their output, but I could see it going off track quickly)
I guess I'd like to see more of an exact breakdown of what prompts and tools and AI are used to get ideas on if I'd use that for myself more
Something with actual users needs a bit more care
Flowers for Algernon.
Or at least the first half. I don't wanna see what it looks like when AI capabilities start going in reverse.
But I want to know.
That depends on your budget. To work within my pro plan's codex limits, I attach the codebase as a single file to various chat windows (GPT 5.2 Thinking - Heavy) and ask it to find bugs/plan a feature/etc. Then I copy the dense tasklist from chat to codex for implementation. This reduces the tokens that codex burns.
Also don't sleep on GPT 5.2 Pro. That model is a beast for planning.
I thought the "dont let agents finishing interrupt you" workflow was an interesting point. I've set up chime hooks to basically do the opposite, and this gives me pause to wonder if I'm underestimating the cost of context switching. I'll give it a go.
The human-agent relationship described in the article made me wonder: are natural, or experienced, managers having more success with AI as subordinates than people without managerial skill? Are AI agents enormously different than arbitrary contractors half a world away where the only communication is daily text exchanges?
You mentioned "harness engineering". How do you approach building "actual programmed tools" (like screenshot scripts) specifically for an LLM's consumption rather than a human's? Are there specific output formats or constraints you’ve found most effective?
That way the blast radius is vastly reduced.
"Please let us sit down and have a reasonable conversation! I was a skeptic, too, but if all skeptics did what I did, they would come to Jesus as well! Oh, and pay the monthly Anthropic tithe!"
This I have found to be important too.
not ashamed to say that i am between steps 2 and 3 in my personal workflow.
>Adopting a tool feels like work, and I do not want to put in the effort
all the different approaches floating online feel ephemeral to me. this, just like for different tools for the op, seem like a chore to adopt. i like the fomo mongering from the community does not help here, but in the end it is a matter of personal discovery to stick with what works for you.
Solution-looking-for-a-problem mentality is a curse.
This is the main reason to use AI agents, though: multitasking. If I'm working on some Terraform changes and I fire off an agent loop, I know it's going to take a while for it to produce something working. In the meantime I'm waiting for it to come back and pretend it's finished (really I'll have to fix it), so I start another agent on something else. I flip back and forth between the finished runs as they notify me. At the end of the day I have 5 things finished rather than two.
The "agent" doesn't have to be anything special either. Anything you can run in a VM or container (vscode w/copilot chat, any cli tool, etc) so you can enable YOLO mode.
But yes, Hashimoto is a high profile CEO/CTO who may well have an indirect, or near-future interest in talking up AI. HN articles extoling the productivity gains of Claude on HN do generally tend to be from older, managerial types (make of that what you will).
- What about non opensource work that's not on Github?
- Costs! I would think "an agent always running" would add up quickly
- In open source work, how does it amplify others. Are you seeing AI Slop as PRs? Can you tell the difference?
I mean, what is the point of change if not to improve? I don't mean "I felt I was more efficient." Feelings aren't measurements. Numbers!
It makes me profoundly sad to think of the huge number of AI agents running endlessly to produce vibe-coded slop. The environmental impact must be massive.
Keep in mind that these are estimates, but you could attempt to extrapolate from here. Programming prompts probably take more because I assume the average context is a good bit higher than the average ChatGPT question, plus additional agents.
All in, I'm not sure if the energy usage long term is going to be overblown by media or if it'll be accurate. I'm personally not sure yet.
OPINION | THE REALITY CHECK In the gleaming offices of Silicon Valley and the boardrooms of the Fortune 500, a new religion has taken hold. Its deity is the Large Language Model, and its disciples—the AI Evangelists—speak in a dialect of "disruption," "optimization," and "seamless integration." But outside the vacuum of the digital world, a dangerous friction is building between AI’s statistical hallucinations and the unyielding laws of physics.
The danger of Artificial Intelligence isn't that it will become our overlord; the danger is that it is fundamentally, confidently, and authoritatively stupid.
The Paradox of the Wind-Powered Car The divide between AI hype and reality is best illustrated by a recent technical "solution" suggested by a popular AI model: an electric vehicle equipped with wind generators on the front to recharge the battery while driving. To the AI, this was a brilliant synergy. It even claimed the added weight and wind resistance amounted to "zero."
To any human who has ever held a wrench or understood the First Law of Thermodynamics, this is a joke—a perpetual motion fallacy that ignores the reality of drag and energy loss. But to the AI, it was just a series of words that sounded "correct" based on patterns. The machine doesn't know what wind is; it only knows how to predict the next syllable.
The Erosion of the "Human Spark" The true threat lies in what we are sacrificing to adopt this "shortcut" culture. There is a specific human process—call it The Stare. It is that thirty-minute window where a person looks at a broken machine, a flawed blueprint, or a complex problem and simply observes.
In that half-hour, the human brain runs millions of mental simulations. It feels the tension of the metal, the heat of the circuit, and the logic of the physical universe. It is a "Black Box" of consciousness that develops solutions from absolutely nothing—no forums, no books, and no Google.
However, the new generation of AI-dependent thinkers views this "Stare" as an inefficiency. By outsourcing our thinking to models that cannot feel the consequences of being wrong, we are witnessing a form of evolutionary regression. We are trading hard-earned competence for a "Yes-Man" in a box.
The Gaslighting of the Realist Perhaps most chilling is the social cost. Those who still rely on their intuition and physical experience are increasingly being marginalized. In a world where the screen is king, the person pointing out that "the Emperor has no clothes" is labeled as erratic, uneducated, or naive.
When a master craftsman or a practical thinker challenges an AI’s "hallucination," they aren't met with logic; they are met with a robotic refusal to acknowledge reality. The "AI Evangelists" have begun to walk, talk, and act like the models they worship—confidently wrong, devoid of nuance, and completely detached from the ground beneath their feet.
The High Cost of Being "Authoritatively Wrong" We are building a world on a foundation of digital sand. If we continue to trust AI to design our structures and manage our logic, we will eventually hit a wall that no "prompt" can fix.
The human brain runs on 20 watts and can solve a problem by looking at it. The AI runs on megawatts and can’t understand why a wind-powered car won't run forever. If we lose the ability to tell the difference, we aren't just losing our jobs—we're losing our grip on reality itself.
Tons of respect for Mitchell. I think you are doing him a disservice with these kinds of comments.
If by company success, then Zuckerberg and Musk are better than all of us.
If by millions made, as he likes to joke/brag about... Fabrice Bellard is an utter failure.
If by install base, the geniuses that made MS Teams are among the best.
None of this is to take away from the successes of the man, but this kind of statement is rather silly.
LOL, been there, done that. It is much less frustrating and demoralizing than babysitting your kind of stupid colleague though. (Thankfully, I don't have any of those anymore. But at previous big companies? Oh man, if only their commits were ONLY as bad as a bad AI commit.)
"Don't be snarky."
"Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative."
I think this is something people ignore, and is significant. The only way to get good at coding with LLMs is actually trying to do it. Even if it's inefficient or slower at first. It's just another skill to develop [0].
And it's not really about using all the plugins and features available. In fact, many plugins and features are counter-productive. Just learn how to prompt and steer the LLM better.
[0]: https://ricardoanderegg.com/posts/getting-better-coding-llms...
Which is why I like this article. It's realistic in terms of describing the value-propositio of LLM-based coding assist tools (aka, AI agents).
The fact that it's underwhelming compared to the hype we see every day is a very, very good sign that it's practical.