undefined | Better HN

0 pointsechelon2mo ago0 comments

> I get it. LLMs are cool technology.

I don't think many of you have legitimately tried Claude Code, or maybe you're holding it wrong.

I'm getting 10x the work done. I'm operating at all layers of the stack with a speed and rapidity I've never had before.

And before anyone accuses me of being some "vibe coder", I've built five nines active-active money rails that move billions of dollars a day at 50kqps+, amongst lots of other hard hitting platform engineering work. Serious senior engineering for over a decade.

This isn't just a "cool technology". We've exited the punch card phase. And that is hard or impossible to come back from.

If you're not seeing these same successes, I legitimately think you're using it wrong.

I honestly don't like subscription services, hyperscaler concentration of power, or the fact I can't run Opus locally. But it doesn't matter - the tool exists in the shape it does, and I have to consume it in the way that it's presented. I hope for a different offering that is more democratic and open, but right now the market hasn't provided that.

It's as if you got access to fiber or broadband and were asked to go back to ISDN/dial up.

0 comments

47 comments · 14 top-level

nerptastic2mo ago· 10 in thread

Man I really thought this was satire. It’s phenomenal that you can gain 10x benefits at all layers of the stack, you must have a very small development team or work alone.

I just don’t see how I could export 10x the work and have it properly validated by peers at this point in time. I may be able to generate code 10-20x faster, but there are nuances that only a human can reason about in my particular sector.

suzzer992mo ago

Senior engineer with 25 years of experience here. I wish I spent enough time actually coding that 10x-ing my coding productivity would matter much to my job. Most of my day is spent wrangling requirements, looking after junior devs, stamping out confusion brush fires before they get out of control, and generally just trying to steer the app away from a trainwreck down the line.

When I do code, it's almost always something novel that I don't know how I'm going to implement until I code a few pieces and see how they fit together. If it's a fairly routine feature based on an existing pattern, I assign it to one of the other devs.

shuntress2mo ago

This is basically the thing I keep coming back to with the agentic tools. It is the wrangling requirements, stamping out confusion, and steering away from a trainwreck down the line that are the actual challenging parts of the job and we can't automate those yet. Once you do actually know the code change you want to make though it is pretty nice to change it 10x faster than before.

hsuduebc22mo ago

I noticed that too. At start. It vaguely reminded me of the famous Navy SEAL copypasta.

seanw4442mo ago

What the fuck did you just fucking say about me, you little bitch?

1 more reply

Aurornis2mo ago

> I just don’t see how I could export 10x the work and have it properly validated by peers at this point in time.

In my experience, the people who 10X their output with Claude Code fit one of two categories:

1. They're not really taking the time to understand the code they're submitting. They might do a skim over the output and see that it looks reasonable and passes tests, but they aren't taking time to understand the code as if they were pair programming. Only when it breaks and the LLM can't patch it up quickly do they go in and fully understand the code.

2. They moved very slowly before Claude Code. I've had some coworkers who would take 2-3 days to get a simple PR out because, to be frank, their work days weren't full of a lot of work. Every time they'd run into a question they'd stop and then bumble around for a few hours until they could talk to the ticket creator about it. They'd get tired of working on a task by 2PM and then save the rest of the work for tomorrow. They'd get an idea and decide to rewrite the PR the next day, and on and on with distractions. When they start using Claude Code the LLM doesn't have the same holdups, so now every time where they were getting stuck or tired before is replaced by an LLM powering through to some solution. Their cognitive load is reduced so they're no longer freezing up during the day. They aren't really becoming 10X engineers like they think, but really just catching up to normal pace

rockostrich2mo ago

I don't know if we're all 10x'ing but our entire org is shipping PRs using an in-house framework akin to Stripe's Minions [1] and many of those PRs are generated from Slack. We definitely have work to do on the latter part of the SDLC to have more confidence in these changes but we can still rely on the existing observability layer to make sure things are working as expected.

Another commenter mentioned that Docker, git, etc. were all tools that greatly enhanced productivity and coding agents are just another tool that does that. I would agree, but argue that it's more impactful than all of those tools combined.

[1] https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-...

shuntress2mo ago

Regarding point #2, while it is of course entirely possible that they are slackers it is more likely that they lack the knowledge you are leveraging in order to declare that the PRs are "simple"

deely32mo ago

It's simpler actually: author trying to make a business developing AI product.

AlexeyBelov2mo ago

Yes, and he's shilling in almost every thread. This is tiring.

nothinkjustai2mo ago

It is satire! They have been doing this bit for a while and people keep falling for it lol

ericmcer2mo ago· 9 in thread

I mean at this point can we just conclude that there are a group of engineers who claim to have incredible success with it and a group that claim it is unreliable and cannot be trusted to do complex tasks.

I struggle to believe that a ton of seemingly intelligent software engineers are too dumb to figure out how to use Claude code to get reliable results, it seems much more likely to me that it can do well at isolated tasks or new projects but fails when pointed at large complex code bases because it just... is a token predictor lol.

But yeah spinning up a green fields project in an extensively solved area (ledgers) is going to be something an AI shines at.

It isn't like we don't use this stuff also, I ask Cursor to do things 20x a day and it does something I don't like 50% of the time. Even things like pasting an error message it struggles with. How do I reconcile my actual daily experience with hype messages I see online?

rurp2mo ago

Right, I keep seeing people talking past each other in this same way. I don't doubt folks when they say they coded up some greenfield project 10x faster with Claude, it's clearly great at many of those tasks! But then so many of them claim that their experience should translate to every developer in every scenario, to the point of saying they must be using it wrong if they aren't having the same experience.

Many software devs work in teams on large projects where LLMs have a more nuanced value. I myself mostly work on a large project inside a large organization. Spitting out lines of code is practically never a bottleneck for me. Running a suite of agents to generate out a ton of code for my coworkers to review doesn't really solve a problem that I have. I still use Claude in other ways and find it useful, but I'm certainly not 10x more productive with it.

dns_snek2mo ago

> But yeah spinning up a green fields project in an extensively solved area (ledgers) is going to be something an AI shines at.

I couldn't disagree with this more. It's impressive at building demos, but asking it to build the foundation for a long-term project has been disastrous in my experience.

When you have an established project and you're asking it to color between the lines it can do that well (most of the time), but when you give it a blank canvas and a lot of autonomy it will likely end up generating crap code at a staggering pace. It becomes a constant fight against entropy where every mess you don't clean up immediately gets picked up as "the way things should be done" the next time.

Before someone asks, this is my experience with both Claude Code (Sonnet/Opus 4.6) and Codex (GPT 5.4).

hombre_fatal2mo ago

I suspect many people here have tried it, but they expected it to one-shot any prompt, and when it didn't, it confirmed what they wanted to be true and they responded with "hah, see?" and then washed their hands of it.

So it's not that they're too stupid. There are various motivations for this: clinging on to familiarity, resistance to what feels like yet another tool, anti-AI koolaid, earnestly underwhelmed but don't understand how much better it can be, reacting to what they perceive to be incessant cheerleading, etc.

It's kind of like anti-Javascript posts on HN 10+ years ago. These people weren't too stupid to understand how you could steelman Node.js, they just weren't curious enough to ask, and maybe it turned out they hadn't even used Javascript since "DHTML" was a term except to do $(".box").toggle().

I wish there were more curiosity on HN.

ericmcer2mo ago

So what do I do differently then?

Hypothetically, you have a simple slice out of bounds error because a function is getting an empty string so it does something like: `""[5]`.

Opus will add a bunch of length & nil checks to "fix" this, but the actual issue is the string should never be empty. The nil checks are just papering over a deeper issue, like you probably need a schema level check for minimum string length.

At that point do you just tell it like "no delete all that, the string should never be empty" and let it figure that out, or do I basically need to pseudo code "add a check for empty strings to this file on line 145", or do I just YOLO and know the issue is gone now so it is no longer my problem?

My bigger point is how does an LLM know that this seemingly small problem is indicative of some larger failure, like lets say this string is a `user.username` which means users can set their name to empty which means an entire migration is probably necessary. All the AI is going to do is smoosh the error messages and kick the can.

6 more replies

rattlesnakedave2mo ago

“I struggle to believe that a ton of seemingly intelligent software engineers are too dumb to figure out how to use Claude code to get reliable results”

Seemingly is doing the heavy lifting here. If you read enough comment threads on HN, it will become obvious why they aren’t getting results.

alwillis2mo ago

> I struggle to believe that a ton of seemingly intelligent software engineers are too dumb to figure out how to use Claude code to get reliable results.

They're not dumb, but I'm not surprised they're struggling.

A developer's mindset has to change when adding AI into the mix, and many developers either can’t or won’t do that. Developers whose commits that look something like "Fixed some bugs" probably aren’t going to take the time to write a decent prompt either.

Whenever there's a technology shift, there are always people who can't or won't adapt. And let's be honest, there are folks whose agenda (consciously or not) is to keep the status quo and "prove" that AI is a bad thing.

No wonder we're seeing wildly different stories about the effectiveness of coding agents.

dandellion2mo ago

Here's my 100 file custom scaffolding AI prompt that I've been working on for the last four months, and can reliably one-shot most math olympic problems and even a rust to do list.

1 more reply

jerf2mo ago

I see two basic cases for the people who are claiming it is useless at this point.

One is that they tried AI-based coding a year or two ago, came to the IMHO completely correct at that time conclusion that it was nearly useless, and have not tried it since then to see that the situation has changed. To which the solution is, try it again. It changed a lot.

The other are those who have incorporated into their personal identity that they hate AI and will never use it. I have seen people do things like fire AI at a task they have good reasons to believe it will fail at, and when it does, project that out to all tasks without letting themselves consciously realize that picking a bad task on purpose skews the deck.

To those people my solution is to encourage them to hold on to their skepticism. I try to hold on to it as well despite the incredible cognitive temptation not to. It is very useful. But at the same time... yeah, there was a step change in the past year or so. It has gotten a lot more useful...

... but a lot of that utility is in ways that don't obviate skilled senior coding skills. It likes to write scripting code without strong types. Since the last time I wrote that, I have in fact used it in a situation where there were enough strong types that it spontaneously originated some, but it still tends to write scripting code out of that context no matter what language it is working in. It is good at very straight-line solutions to code but I rarely see it suggest using databases, or event sourcing, or a message bus, or any of a lot of other things... it has a lot of Not Invented Here syndrome where it instead bashes out some minimal solution that passes the unit tests with flying colors but can't be deployed at scale. No matter how much documentation a project has it often ends up duplicating code just because the context window is only so large and it doesn't necessarily know where the duplicated code might be. There's all sorts of ways it still needs help to produce good output.

I also wonder how many people are failing to prompt it enough. Some of my prompts are basically "take this and do that and write a function to log the error", but a lot of my prompts are a screen or two of relevant context of the project, what it is we are trying to do, why the obvious solution doesn't work, here's some other code to look at, here's the relevant bugs and some Wiki documentation on the planning of the project, we should use {event sourcing/immutable trees/stored procedures/whatever}, interact with me for questions before starting anything. This is not a complete explanation of what they are doing anymore, but there's still a lot of ways in which what an LLM can really do is style transfer... it is just taking "take this and do that and write a function to log the error" and style-transforming that into source code. If you want it to do something interesting it really helps to give it enough information in the first place for the "style transfer" to get a hold of and do something with. Don't feel silly "explaining it to a computer", you're giving the function enough data to operate on.

sutib2mo ago

I can see huge utility with AI as a guide and helper.

But not being one leg in the code myself is not something I am comfortable with. It starts feeling like management and not development. I really feel the abdication very strongly and it makes me unable and unwilling to put a hard stamp on quality. I have seen too much hallucination or half missed requirements to put that much trust in AI.

It's the same with code reviews of hard tickets. You can scroll past and just approve, but do you really understand what your colleague has built? Are you really in the driver's seat? It feels to me like YOLOing with major consequences.

I dont but, at all that people doing 20x output have any idea what they are coding. They are just pressing the yolo button and no one, not the engineer, not the AI and not management is in the driver's seat. it is a very scary time.

embedding-shape2mo ago· 4 in thread

> and I have to consume it in the way that it's presented

I'm just curious, why do you "have to"? Don't get me wrong, I'm making the same choice myself too, realizing a bunch of global drawbacks because of my local/personal preference, but I won't claim I have to, it's a choice I'm making because I'm lazy.

wongarsu2mo ago

What are the reasonable options besides a Claude Code subscription (or an equivalent from Codex or Copilot)?

I could pay API prices for the same models, but aside from paying much more for the same result that doesn't seem helpful

I could pay a 4-5 figure sum for hardware to run a far inferior open model

I could pay a six figure sum for hardware to run an open model that's only a couple months behind in capability (or a 4-5 figure sum to run the same model at a snail's pace)

I could pay API costs to semi-trustworthy inference provider to run one of those open models

None of those seem like great alternatives. If I want cutting-edge coding performance then a subscription is the most reasonable option

Note that this applies mostly to coding. For many other tasks local models or paid inference on open models is very reasonable. But for coding that last bit of performance matters

prabal972mo ago

I use my OAI subscription on my Claude Code. I get the benefit of the Claude Code interface with the intelligence of OAI models.

https://prabal.ca/posts/claude-code-chatgpt-subscription/

echelonOP2mo ago

My job title is "provide value".

I'm given a tool that lets me 10x "provide value".

My personal preferences and tastes literally do not matter.

embedding-shape2mo ago

As a professional you have a choice in how you produce whatever it is you produce. Sure, you can go for the simplest, most expensive and "easiest" way of doing things, or you can do other things, depending on your perspective and requirements. None of this is set in stone, some people make choices based on personal preferences, and that matters as much to them as your choices matter to you.

Aurornis2mo ago· 3 in thread

I use Claude Code a lot, but I don't understand these "I'm doing 10X the work" comments.

I spend a lot of time reviewing any code that comes out of Claude Code. Even using Opus 4.6 with max effort there is almost always something that needs to be changed, often dramatically.

I can see how people go down the path of thinking "Wow, this code compiles and passes my tests! Ship it!" and start handing trust over to Opus, but I've already seen what this turns into 6 months down the road: Projects get mired down in so much complexity and LLM spaghetti that the codebase becomes fragile. Everyone is sidetracked restructuring messy code from the past, then fighting bugs that appear in the change.

I can believe some of the more recent studies showing LLMs can accelerate work by circa 20% (1.2X) because that's on the same order of magnitude that I and others are seeing with careful use.

When someone comes out and claims 10X more output, I simply cannot believe they're doing careful engineering work instead of just shipping the output after a cursory glance.

tracker12mo ago

I find that it's relative to the amount of planning time you spend... I feel like I've gotten around 5x the output while using Claude Code w/ Opus over what I will get done myself... That said, I'm probably spending about 3x as much time planning as I would when just straight coding for/by myself. And that's generally the difference.

I can use the agent to scaffold a lot of test/demo frameworks around the pieces I'm working on pretty cleanly and have the agent fill in. I still spend a lot of time validating the tests and the code being completed though.

The errors I tend to get from the agent are roughly similar to what I might see from a developer/team that works remotely... you still need to verify. The difference is the turn around seems to be minutes over days. You're also able to observe over simply review... When I see a bad path, I can usually abort/cancel, revert back to the last commit and try again with more planning.

Pay082mo ago

That's part of why I don't get AI for directly writing code at all. If I am going to be reviewing anything that comes out of it (and I will) then I might as well just write it myself. It's easier and faster, although it does also make it easier to fall victim to blind spots.

eijkene2mo ago

I think there’s a subset of engineers who were never all that good (we have no idea how many there are) who benefit most from llm’s.

We should also keep in mind there’s always been an insane shortage of high quality devs. So I’m not surprised with what we seeing.

But this notion that an elite dev is seeing 10x productivity gain is absolute nonsense. LLM’s hold experts back in most contexts.

xantronix2mo ago· 2 in thread

Mind if I use this as a copypasta for the future? This checks off every point people bring on LinkedIn and elsewhere.

In all seriousness though, writing code, or even sitting down and properly architecting things, have never been bottlenecks for me. It has either been artificial deadlines preventing me from writing proper unit tests, or the requirement for code review from people on my team who don't even work on the same codebase as I do on a daily basis. I have often stated and stand by the assertion that I develop at the speed of my own understanding, and I think that is a good virtue to carry forth that I think will stand the test of time and bring about the best organisational outcomes. It's just a matter of finding the right place that values this approach.

Edit for context: My team is an ops team that needed a couple developers; I was picked to implement some internal tooling. The deadlines I was given for the initial development are tied directly to my performance evaluation. My boss has only ever been a manager for almost two years. He has only ever had development headcount for less than a year. He has never been on a development team himself. The man does not take breaks and micromanages at every opportunity he gets. He is paranoid for his job, thinking he is going to be imminently replaced by our (cheaper) EU counterparts. His management style and verbal admonitions reflect this; he frequently projects these insecurities onto others, using unnecessarily accusatory speech. I am not the only developer on my team who has had such interactions with him. I have screenshots of conversations with him that I felt necessary to present to a therapist. This degree of time pressure is entirely unprecedented in my 20 year career. Yes, this is a dysfunctional environment.

mikebenfield2mo ago

> artificial deadlines preventing me from writing proper unit tests, or the requirement for code review from people on my team who don't even work on the same codebase as I do on a daily basis

I have never experienced this, and it sounds remarkably dysfunctional to me.

xantronix2mo ago

Believe me, it is very dysfunctional. As I've mentioned to your first replyer, my boss has only had developers for less than a year. This is an operations team I was assigned to in order to provide them some much needed tooling. The pressure my boss has perceived from above has led to my own significant burnout. The guy does not take days off and has always been logged into Slack on the odd hours I would need to pull up some HR form or another. I am currently off work for several months dealing with the fallout from all that.

I've tried everything I can to cope and am not sure I will be willing to return to that team once I am past my medical leave.

dandellion2mo ago· 2 in thread

You must be using it wrong, because I'm getting 100x the work done and currently at 1.5 million MRR with this SAAS I vibe coded over the weekend.

After I solved entrepreneurship I decided to retire and I now spend my days reading HN, posting on topics about AI.

darth_aardvark2mo ago

You're still manually posting? All of my HN posting, trolling, shitposting and spamming is taken care of by a fleet of bots I vibecoded in the last 5 minutes.

slowmovintarget2mo ago

You gest, but I know people who've done this.

"I gotta be present." Me: Reenacting the Malcolm Reynolds too many responses meme.

britzkopf2mo ago· 2 in thread

> And before anyone accuses me of being some "vibe coder", I've built five nines active-active money rails that move billions of dollars a day at 50kqps+, amongst lots of other hard hitting platform engineering work. Serious senior engineering for over a decade

You sound like a pro wrestler. I'd like to know what "hard-hitting" engineering work is. Hydraulic hammers?

dmoy2mo ago

I mean five nines is legitimately difficult to accomplish for a lot of problem spaces.

It's also like.... difficult to honestly and accurately measure. And account for whether or not you're getting lucky based on your underlying dependencies (servers, etc) not crashing as much as advertised, or if it's actually five nines. Or whether you've run it for a month and gotten <30s of measure downtime and declared victory, vs run it for three years with copious software updates.

I always assume most people claiming five nines are just not measuring it correctly, or have not exercised the full set of things that will go wrong over a long enough period of time (dc failures, network partitions, config errors, bad network switches that drop only UDP traffic on certain ports, erroneous ACL changes, bad software updates, etc etc)

Maybe they did it all correct though, in which case, yea, seems hard hitting to me.

sutib2mo ago

5 nines is at best a temporary achievement, given enough time.

ipaddr2mo ago· 1 in thread

I'm getting 1,000x improvement building notepad applications with 6 9s. No one is faster.

Need some help selling these notepad apps, do you have a prompt for that?

RaftPeople2mo ago

I don't want to sound like I'm trying to one-up you, but I've basically vibe coded the entire internet.

I'm surprised nobody thought of it before me but basically the LLM's are trained on the internet and I just had it spit back out everything.

It's running in parallel so I can validate it, which of course I'm using LLM's to do that.

Once it's ready I will put it on the market, but get this, my internet will be cheaper than the current internet. I'll probably just make it one cheaper, like if the current internet costs, for example, 7, I'll make my internet cost 6.

dwaltrip2mo ago

It’d be cool to see your process in depth. You should record some of your sessions :)

I mostly believe you. I have seen hints of what you are talking about.

But often times I feel like I’m on the right track but I’m actually just spinning when wheels and the AI is just happily going along with it.

Or I’m getting too deep on something and I’m caught up in the loop, becoming ungrounded from the reality of the code and the specific problem.

If I notice that and am not too tired, I can reel it back in and re-ground things. Take a step back and make sure we are on reasonable path.

But I’m realizing it can be surprisingly difficult to catch that loop early sometimes. At least for me.

I’ve also done some pretty awesome shit with it that either would have never happened or taken far longer without AI — easily 5x-10x in many cases. It’s all quite fascinating.

Much to learn. This idea is forming for me that developing good “AI discipline” is incredibly important.

P.s. sometimes I also get this weird feeling of “AI exhaustion”. Where the thought of sending another prompt feels quite painful. The last week I’ve felt that a lot.

P.p.s. And then of course this doesn’t even touch on maintaining code quality over time. The “after” part when the LLM implements something. There are lots of good patterns and approaches for handling this, but it’s a distinct phase of the process with lots of complexities and nuances. And it’s oh-so-temping to skip or postpone. More so if the AI output is larger — exactly when you need it most.

xantronix2mo ago

I reread your comment and I think you might be sincere. To address this point:

> If you're not seeing these same successes, I legitimately think you're using it wrong.

I'm not sure how you could say that, considering I'm not using it at all. I don't want to, and I don't plan to. If that becomes an issue, I'm exiting this industry because I simply don't fucking care any longer. I am fine living the rest of my life and dying happy and sore being an automotive technician.

epistasis2mo ago

I'm still reviewing all the code that's created, and asking for modifications, and basically using LLMs as a 2000 wpm typist, and seeing similar productivity gains. Especially in new frameworks! Everything is test driven development, super clean and super fast.

The challenge now is how to plan architectures and codebases to get really big and really scale, without AI slop creating hidden tech debt.

Foundations of the code must be very solid, and the architecture from the start has to be right. But even redoing the architecture becomes so much faster now...

skydhash2mo ago

> If you're not seeing these same successes, I legitimately think you're using it wrong.

What is “using it right”? You wrote claims, but explain nothing about your process. Anything not reproducible is either luck or lie.

blurbleblurble2mo ago

> fact I can't run Opus locally

Yet

surgical_fire2mo ago

I read this as satire. I still think it is.

j / k navigate · click thread line to collapse