I don't think many of you have legitimately tried Claude Code, or maybe you're holding it wrong.
I'm getting 10x the work done. I'm operating at all layers of the stack with a speed and rapidity I've never had before.
And before anyone accuses me of being some "vibe coder", I've built five nines active-active money rails that move billions of dollars a day at 50kqps+, amongst lots of other hard hitting platform engineering work. Serious senior engineering for over a decade.
This isn't just a "cool technology". We've exited the punch card phase. And that is hard or impossible to come back from.
If you're not seeing these same successes, I legitimately think you're using it wrong.
I honestly don't like subscription services, hyperscaler concentration of power, or the fact I can't run Opus locally. But it doesn't matter - the tool exists in the shape it does, and I have to consume it in the way that it's presented. I hope for a different offering that is more democratic and open, but right now the market hasn't provided that.
It's as if you got access to fiber or broadband and were asked to go back to ISDN/dial up.
I just don’t see how I could export 10x the work and have it properly validated by peers at this point in time. I may be able to generate code 10-20x faster, but there are nuances that only a human can reason about in my particular sector.
When I do code, it's almost always something novel that I don't know how I'm going to implement until I code a few pieces and see how they fit together. If it's a fairly routine feature based on an existing pattern, I assign it to one of the other devs.
In my experience, the people who 10X their output with Claude Code fit one of two categories:
1. They're not really taking the time to understand the code they're submitting. They might do a skim over the output and see that it looks reasonable and passes tests, but they aren't taking time to understand the code as if they were pair programming. Only when it breaks and the LLM can't patch it up quickly do they go in and fully understand the code.
2. They moved very slowly before Claude Code. I've had some coworkers who would take 2-3 days to get a simple PR out because, to be frank, their work days weren't full of a lot of work. Every time they'd run into a question they'd stop and then bumble around for a few hours until they could talk to the ticket creator about it. They'd get tired of working on a task by 2PM and then save the rest of the work for tomorrow. They'd get an idea and decide to rewrite the PR the next day, and on and on with distractions. When they start using Claude Code the LLM doesn't have the same holdups, so now every time where they were getting stuck or tired before is replaced by an LLM powering through to some solution. Their cognitive load is reduced so they're no longer freezing up during the day. They aren't really becoming 10X engineers like they think, but really just catching up to normal pace
Another commenter mentioned that Docker, git, etc. were all tools that greatly enhanced productivity and coding agents are just another tool that does that. I would agree, but argue that it's more impactful than all of those tools combined.
[1] https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-...
I struggle to believe that a ton of seemingly intelligent software engineers are too dumb to figure out how to use Claude code to get reliable results, it seems much more likely to me that it can do well at isolated tasks or new projects but fails when pointed at large complex code bases because it just... is a token predictor lol.
But yeah spinning up a green fields project in an extensively solved area (ledgers) is going to be something an AI shines at.
It isn't like we don't use this stuff also, I ask Cursor to do things 20x a day and it does something I don't like 50% of the time. Even things like pasting an error message it struggles with. How do I reconcile my actual daily experience with hype messages I see online?
Many software devs work in teams on large projects where LLMs have a more nuanced value. I myself mostly work on a large project inside a large organization. Spitting out lines of code is practically never a bottleneck for me. Running a suite of agents to generate out a ton of code for my coworkers to review doesn't really solve a problem that I have. I still use Claude in other ways and find it useful, but I'm certainly not 10x more productive with it.
I couldn't disagree with this more. It's impressive at building demos, but asking it to build the foundation for a long-term project has been disastrous in my experience.
When you have an established project and you're asking it to color between the lines it can do that well (most of the time), but when you give it a blank canvas and a lot of autonomy it will likely end up generating crap code at a staggering pace. It becomes a constant fight against entropy where every mess you don't clean up immediately gets picked up as "the way things should be done" the next time.
Before someone asks, this is my experience with both Claude Code (Sonnet/Opus 4.6) and Codex (GPT 5.4).
So it's not that they're too stupid. There are various motivations for this: clinging on to familiarity, resistance to what feels like yet another tool, anti-AI koolaid, earnestly underwhelmed but don't understand how much better it can be, reacting to what they perceive to be incessant cheerleading, etc.
It's kind of like anti-Javascript posts on HN 10+ years ago. These people weren't too stupid to understand how you could steelman Node.js, they just weren't curious enough to ask, and maybe it turned out they hadn't even used Javascript since "DHTML" was a term except to do $(".box").toggle().
I wish there were more curiosity on HN.
Hypothetically, you have a simple slice out of bounds error because a function is getting an empty string so it does something like: `""[5]`.
Opus will add a bunch of length & nil checks to "fix" this, but the actual issue is the string should never be empty. The nil checks are just papering over a deeper issue, like you probably need a schema level check for minimum string length.
At that point do you just tell it like "no delete all that, the string should never be empty" and let it figure that out, or do I basically need to pseudo code "add a check for empty strings to this file on line 145", or do I just YOLO and know the issue is gone now so it is no longer my problem?
My bigger point is how does an LLM know that this seemingly small problem is indicative of some larger failure, like lets say this string is a `user.username` which means users can set their name to empty which means an entire migration is probably necessary. All the AI is going to do is smoosh the error messages and kick the can.
Seemingly is doing the heavy lifting here. If you read enough comment threads on HN, it will become obvious why they aren’t getting results.
They're not dumb, but I'm not surprised they're struggling.
A developer's mindset has to change when adding AI into the mix, and many developers either can’t or won’t do that. Developers whose commits that look something like "Fixed some bugs" probably aren’t going to take the time to write a decent prompt either.
Whenever there's a technology shift, there are always people who can't or won't adapt. And let's be honest, there are folks whose agenda (consciously or not) is to keep the status quo and "prove" that AI is a bad thing.
No wonder we're seeing wildly different stories about the effectiveness of coding agents.
One is that they tried AI-based coding a year or two ago, came to the IMHO completely correct at that time conclusion that it was nearly useless, and have not tried it since then to see that the situation has changed. To which the solution is, try it again. It changed a lot.
The other are those who have incorporated into their personal identity that they hate AI and will never use it. I have seen people do things like fire AI at a task they have good reasons to believe it will fail at, and when it does, project that out to all tasks without letting themselves consciously realize that picking a bad task on purpose skews the deck.
To those people my solution is to encourage them to hold on to their skepticism. I try to hold on to it as well despite the incredible cognitive temptation not to. It is very useful. But at the same time... yeah, there was a step change in the past year or so. It has gotten a lot more useful...
... but a lot of that utility is in ways that don't obviate skilled senior coding skills. It likes to write scripting code without strong types. Since the last time I wrote that, I have in fact used it in a situation where there were enough strong types that it spontaneously originated some, but it still tends to write scripting code out of that context no matter what language it is working in. It is good at very straight-line solutions to code but I rarely see it suggest using databases, or event sourcing, or a message bus, or any of a lot of other things... it has a lot of Not Invented Here syndrome where it instead bashes out some minimal solution that passes the unit tests with flying colors but can't be deployed at scale. No matter how much documentation a project has it often ends up duplicating code just because the context window is only so large and it doesn't necessarily know where the duplicated code might be. There's all sorts of ways it still needs help to produce good output.
I also wonder how many people are failing to prompt it enough. Some of my prompts are basically "take this and do that and write a function to log the error", but a lot of my prompts are a screen or two of relevant context of the project, what it is we are trying to do, why the obvious solution doesn't work, here's some other code to look at, here's the relevant bugs and some Wiki documentation on the planning of the project, we should use {event sourcing/immutable trees/stored procedures/whatever}, interact with me for questions before starting anything. This is not a complete explanation of what they are doing anymore, but there's still a lot of ways in which what an LLM can really do is style transfer... it is just taking "take this and do that and write a function to log the error" and style-transforming that into source code. If you want it to do something interesting it really helps to give it enough information in the first place for the "style transfer" to get a hold of and do something with. Don't feel silly "explaining it to a computer", you're giving the function enough data to operate on.
But not being one leg in the code myself is not something I am comfortable with. It starts feeling like management and not development. I really feel the abdication very strongly and it makes me unable and unwilling to put a hard stamp on quality. I have seen too much hallucination or half missed requirements to put that much trust in AI.
It's the same with code reviews of hard tickets. You can scroll past and just approve, but do you really understand what your colleague has built? Are you really in the driver's seat? It feels to me like YOLOing with major consequences.
I dont but, at all that people doing 20x output have any idea what they are coding. They are just pressing the yolo button and no one, not the engineer, not the AI and not management is in the driver's seat. it is a very scary time.
I'm just curious, why do you "have to"? Don't get me wrong, I'm making the same choice myself too, realizing a bunch of global drawbacks because of my local/personal preference, but I won't claim I have to, it's a choice I'm making because I'm lazy.
I could pay API prices for the same models, but aside from paying much more for the same result that doesn't seem helpful
I could pay a 4-5 figure sum for hardware to run a far inferior open model
I could pay a six figure sum for hardware to run an open model that's only a couple months behind in capability (or a 4-5 figure sum to run the same model at a snail's pace)
I could pay API costs to semi-trustworthy inference provider to run one of those open models
None of those seem like great alternatives. If I want cutting-edge coding performance then a subscription is the most reasonable option
Note that this applies mostly to coding. For many other tasks local models or paid inference on open models is very reasonable. But for coding that last bit of performance matters
I'm given a tool that lets me 10x "provide value".
My personal preferences and tastes literally do not matter.
I spend a lot of time reviewing any code that comes out of Claude Code. Even using Opus 4.6 with max effort there is almost always something that needs to be changed, often dramatically.
I can see how people go down the path of thinking "Wow, this code compiles and passes my tests! Ship it!" and start handing trust over to Opus, but I've already seen what this turns into 6 months down the road: Projects get mired down in so much complexity and LLM spaghetti that the codebase becomes fragile. Everyone is sidetracked restructuring messy code from the past, then fighting bugs that appear in the change.
I can believe some of the more recent studies showing LLMs can accelerate work by circa 20% (1.2X) because that's on the same order of magnitude that I and others are seeing with careful use.
When someone comes out and claims 10X more output, I simply cannot believe they're doing careful engineering work instead of just shipping the output after a cursory glance.
I can use the agent to scaffold a lot of test/demo frameworks around the pieces I'm working on pretty cleanly and have the agent fill in. I still spend a lot of time validating the tests and the code being completed though.
The errors I tend to get from the agent are roughly similar to what I might see from a developer/team that works remotely... you still need to verify. The difference is the turn around seems to be minutes over days. You're also able to observe over simply review... When I see a bad path, I can usually abort/cancel, revert back to the last commit and try again with more planning.
We should also keep in mind there’s always been an insane shortage of high quality devs. So I’m not surprised with what we seeing.
But this notion that an elite dev is seeing 10x productivity gain is absolute nonsense. LLM’s hold experts back in most contexts.
In all seriousness though, writing code, or even sitting down and properly architecting things, have never been bottlenecks for me. It has either been artificial deadlines preventing me from writing proper unit tests, or the requirement for code review from people on my team who don't even work on the same codebase as I do on a daily basis. I have often stated and stand by the assertion that I develop at the speed of my own understanding, and I think that is a good virtue to carry forth that I think will stand the test of time and bring about the best organisational outcomes. It's just a matter of finding the right place that values this approach.
Edit for context: My team is an ops team that needed a couple developers; I was picked to implement some internal tooling. The deadlines I was given for the initial development are tied directly to my performance evaluation. My boss has only ever been a manager for almost two years. He has only ever had development headcount for less than a year. He has never been on a development team himself. The man does not take breaks and micromanages at every opportunity he gets. He is paranoid for his job, thinking he is going to be imminently replaced by our (cheaper) EU counterparts. His management style and verbal admonitions reflect this; he frequently projects these insecurities onto others, using unnecessarily accusatory speech. I am not the only developer on my team who has had such interactions with him. I have screenshots of conversations with him that I felt necessary to present to a therapist. This degree of time pressure is entirely unprecedented in my 20 year career. Yes, this is a dysfunctional environment.
I have never experienced this, and it sounds remarkably dysfunctional to me.
I've tried everything I can to cope and am not sure I will be willing to return to that team once I am past my medical leave.
After I solved entrepreneurship I decided to retire and I now spend my days reading HN, posting on topics about AI.
"I gotta be present." Me: Reenacting the Malcolm Reynolds too many responses meme.
You sound like a pro wrestler. I'd like to know what "hard-hitting" engineering work is. Hydraulic hammers?
It's also like.... difficult to honestly and accurately measure. And account for whether or not you're getting lucky based on your underlying dependencies (servers, etc) not crashing as much as advertised, or if it's actually five nines. Or whether you've run it for a month and gotten <30s of measure downtime and declared victory, vs run it for three years with copious software updates.
I always assume most people claiming five nines are just not measuring it correctly, or have not exercised the full set of things that will go wrong over a long enough period of time (dc failures, network partitions, config errors, bad network switches that drop only UDP traffic on certain ports, erroneous ACL changes, bad software updates, etc etc)
Maybe they did it all correct though, in which case, yea, seems hard hitting to me.
Need some help selling these notepad apps, do you have a prompt for that?
I'm surprised nobody thought of it before me but basically the LLM's are trained on the internet and I just had it spit back out everything.
It's running in parallel so I can validate it, which of course I'm using LLM's to do that.
Once it's ready I will put it on the market, but get this, my internet will be cheaper than the current internet. I'll probably just make it one cheaper, like if the current internet costs, for example, 7, I'll make my internet cost 6.
I mostly believe you. I have seen hints of what you are talking about.
But often times I feel like I’m on the right track but I’m actually just spinning when wheels and the AI is just happily going along with it.
Or I’m getting too deep on something and I’m caught up in the loop, becoming ungrounded from the reality of the code and the specific problem.
If I notice that and am not too tired, I can reel it back in and re-ground things. Take a step back and make sure we are on reasonable path.
But I’m realizing it can be surprisingly difficult to catch that loop early sometimes. At least for me.
I’ve also done some pretty awesome shit with it that either would have never happened or taken far longer without AI — easily 5x-10x in many cases. It’s all quite fascinating.
Much to learn. This idea is forming for me that developing good “AI discipline” is incredibly important.
P.s. sometimes I also get this weird feeling of “AI exhaustion”. Where the thought of sending another prompt feels quite painful. The last week I’ve felt that a lot.
P.p.s. And then of course this doesn’t even touch on maintaining code quality over time. The “after” part when the LLM implements something. There are lots of good patterns and approaches for handling this, but it’s a distinct phase of the process with lots of complexities and nuances. And it’s oh-so-temping to skip or postpone. More so if the AI output is larger — exactly when you need it most.
> If you're not seeing these same successes, I legitimately think you're using it wrong.
I'm not sure how you could say that, considering I'm not using it at all. I don't want to, and I don't plan to. If that becomes an issue, I'm exiting this industry because I simply don't fucking care any longer. I am fine living the rest of my life and dying happy and sore being an automotive technician.
The challenge now is how to plan architectures and codebases to get really big and really scale, without AI slop creating hidden tech debt.
Foundations of the code must be very solid, and the architecture from the start has to be right. But even redoing the architecture becomes so much faster now...
What is “using it right”? You wrote claims, but explain nothing about your process. Anything not reproducible is either luck or lie.
Yet