undefined | Better HN

0 pointsadamtaylor_131mo ago0 comments

I am an engineer. I hire other engineers. I run a company that ships usable software for small businesses.

We do this every day. I'm sorry to say, we are indeed shipping in days what used to take weeks.

0 comments

34 comments · 8 top-level

As a software engineer who also hires other software engineers, I’m curious about the disconnect in our experiences.

I do systems programming. Before AI feature development roughly went like, design, implement, test, review with some back edges and a lot of time spent in test and review.

AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing, though still an improvement overall.

We do not see the weeks to days improvement though. The bottleneck before was testing and reviewing, and they are even bigger bottlenecks now.

What kind of work do you do, and what kind of workflow were you using before and after AI to benefit so much?

satvikpendem1mo ago

> I do systems programming.

I'll stop you right there. AI is not good at systems programming, it's good at CRUD web development, which is where most people are seeing the gains.

oytis1mo ago

I think antirez mentioned somewhere he considered it particularly good at systems programming.

satvikpendem1mo ago

Depends what it's used for, generally I've seen that due to the paucity of C or Rust etc training data vs Javascript and TypeScript, LLMs aren't as good at the former vs the latter.

dboreham1mo ago

This is a myth in my experience. LLMs are good at all the kinds of programming I've tried using them on, including many cases that are very far from "CRUD web development".

Traubenfuchs1mo ago

>95% of software development is crud.

id1mo ago

It's really not, though. As soon as systems have to scale, regulatory requirements come in, etc. it becomes more complex.

AI has solved simple CRUD, yes, but CRUD, was easy before.

kakacik1mo ago

Anytime you hear such wild claims, imagine a typical code sweat shop (not just crud apps but templated eshops/business pages etc), not a system that will evolve for another 10-20 years beyond initial implementation and is backend cornerstone of some part of some corporation. That is in the case its actually true, there is tons of PR happening here, plus another gigaton of uncritical fanboyism like with any strong topic.

Now there may be an additional corner case or 20 where its still valid but they are not your typical software engineering work.

I also have your experience, even 100x code delivery improvement would barely move the needle of project delivery in our place. Better, more automated integration and end-to-end functional tests which reflect real world usage/data flows would actually make much bigger difference, no reason to think llms couldn't deliver this in near future.

stavros1mo ago

Not the OP, but it might be that AI isn't as good at systems programming as it is at other domains, or it might be that you're using it differently than I am. I don't know which one it is (maybe AI just isn't good at writing the language you work with).

For things like web frontents/backends, though, it works beautifully. I ship things in days that would take me weeks to write by hand, and I'm very fast at writing things by hand. The AI also ships many fewer bugs than our average senior programmer, though maybe not fewer bugs than our staff programmers.

rustystump1mo ago

In my experience ai has had far far more bugs than most of what i call senior engineers but far fewer than juniors.

The boost is for what are glorified crud apps which it 1000x the tedious work. However, the choices it makes along the way quickly blows up without cleaning. Seniors know how to keep their workstation clean or they should.

stavros1mo ago

It sounds like we have opposite experiences.

skeptic_ai1mo ago

I never touched kubernetes and in 1 week I have a few nodes running and i understand a lot of it. Not perfect but not bad.

oytis1mo ago

I have recently learned Kubernetes without AI and one week is more than enough to understand most of it.

newphone7331mo ago

This is definitely not true. But I doubt GP understand "most" of kubernetes too. They probably have a good working knowledge of the important commonly used features.

1 more reply

thrawa83873361mo ago

That was the usual experience pre AI

logicchains1mo ago

>AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing,

Maybe they're using AI for testing and reviewing more than you are, not just for coding?

MeetingsBrowser1mo ago

The "AI implementation" step in my workflow includes separate agents dedicated to testing and reviewing changes. The self feedback loop catches a lot of errors and mistakes, but it rarely produces working code in one go.

In my experience, the generated code handles the happy path, but isn't great about edge cases or writing clean code, even with explicit instruction in the initial prompt.

We usually end up doing multiple iterations with what claude/codex output, pointing out issues, asking for changes, etc.

logicchains1mo ago

>AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing,

Maybe they're using AI for testing and reviewing more than you are?

adamtaylor_13OP1mo ago

We design and build software systems that our clients' businesses run on. So it's not the product, it's the system that allows them to run their business. Typically, it's less "QuickBooks" and more "Let QuickBooks talk to 10 different systems" and then custom functionality built on that.

It's glue, custom business workflows, and basic web CRUD stuff. We build almost everything on Rails unless there's a critical reason not to (e.g., maintaining an existing system versus building from scratch.)

With very few exceptions our team composition is one senior engineer paired to a business. So we get to avoid a large amount of SDLC busywork which is inter-team communication. This leaves more time for client<->engineer communication which has a host of additional benefits. We also build with a "North Star" methodology which keeps everyone, including the client, laser focused on the work at hand.

To answer your final question about how we're benefiting so much from AI, I think it's primarily that we're leaning into it for both implementation, testing, and review. I know it's a sin to let AI review AI, but... it works. I'm actively skeptical of it myself, but our error rate and rework rates don't lie.

And we've got clients in various stages of development and/or long-term support. It's not like we're just hammering a bunch of stuff out and then bouncing. Most of these are multi-year tightly-integrated projects with our clients and we don't see a lack of trust or frustration that you'd expect to see if you were shipping slop. Our Honeybadger errors typically stay at zero, our performance metrics are acceptable across the board, and most importantly our clients love the work we're doing.

I can't think of any other way to measure the quality of what we're doing. And by those metrics, AI has made us better, not worse.

I should write a blog post to outline more of this in detail.

b0rtb0rt1mo ago

i work on cutting edge c++ system programming and we are using codex for everything now, it’s pretty impressive honestly what it can do

aprilthird20211mo ago· 4 in thread

Give an example.

I have an example in my line of work. Full service rewrite in a new language. Would have taken forever without AI. AI makes it easier, faster. The service has better throughput, uses less machines. Having a complete full test harness that allows us to ensure we are meeting all the functionality of the previous service is key. AND we are keeping the old service on standby because we know we don't know what might be wrong with the new one.

What's your example?

adamtaylor_13OP1mo ago

From another comment above:

> Our projects are closed source due to our clients owning the code, but I can offer anecdote. We have a client whose business operates on 2-3 very niche SaaS applications in the veterinary/animal medicine space. In a span of about 6 months, we completely ripped out 2 of those 3 and are working on replacing the 3rd one right now. We've done this with a single senior engineer working with the client between 20-40 hours per week with no major regressions. The business has been able to continue working as usual with no disruptions throughout this process.

> Obviously it's hard to measure this objectively, but I can't imagine having done this pre-AI with zero downtime and having replaced those SaaS applications in that timeframe.

aprilthird20211mo ago

Yeah that validates my experience. It's best / mostly preferable for ground up rewrites and greenfield work.

I worry we haven't had to maintain vibecoded applications much and have no idea how difficult they will be to debug (or not).

pron1mo ago

If you carefully review the code then you're not doing what Armstrong was talking about. If you're not reviewing the code, then you don't really know what it is that the AI built. Of course it passes tests; that's not the problem. The problem is that the code is complicated and obtuse, even if it doesn't seem that way on the surface, and after some rounds of evolution, the agents are no longer able to evolve or maintain the code.

The difference between it's working now and it will continue working in two years is exactly the problem with AI-generated code because the tests can't tell you that, and you don't know which one you have if you don't look really carefully.

aprilthird20211mo ago

I was pretty clear that we did not review all the code, and we have kept the original service on standby exactly because we are aware code is complicated and could have obscure failure modes while passing our whole test suite.

maccard1mo ago· 2 in thread

Can you link to a changelog that shows the 5-10x feature increases? I keep hearing this, but I don’t see anything I use ever actually shipping like this, or people backing this up with any sort of proof.

adamtaylor_13OP1mo ago

Our projects are closed source due to our clients owning the code, but I can offer anecdote. We have a client whose business operates on 2-3 very niche SaaS applications in the veterinary/animal medicine space. In a span of about 6 months, we completely ripped out 2 of those 3 and are working on replacing the 3rd one right now. We've done this with a single senior engineer working with the client between 20-40 hours per week with no major regressions. The business has been able to continue working as usual with no disruptions throughout this process.

Obviously it's hard to measure this objectively, but I can't imagine having done this pre-AI with zero downtime and having replaced those SaaS applications in that timeframe.

toraway1mo ago

That reminds me of a chart I saw posted in HN comments recently that someone created tracking bullet points in Claude Code release notes per day that was cited as "proof of a step change" in AI development over the last year. It showed like a dozen or so on average that jumped to to like over 50 one month and stayed around that number.

(Not the exact same chart but similar idea, I guess it's sort of a meme: https://imgur.com/a/YrNGYOR)

So I looked at the most recent CC release notes on Github and the majority look like this:

  Fixed /clear not resetting the terminal tab title after a conversation
  Fixed session title chip from /rename disappearing while a permission or other dialog is active
  Fixed agent panel below the prompt being hidden when subagents are running (regression in 2.1.122)
  Fixed external-editor handoff (Ctrl+G) blanking the conversation history above the prompt
  Fixed /context dumping its rendered ASCII visualization grid into the conversation, wasting ~1.6k tokens per call
  Fixed OAuth refresh race after wake-from-sleep that could log out all running sessions
  Fixed 1-hour prompt cache TTL being silently downgraded to 5 minutes
  Fixed cache-miss warning appearing spuriously after /clear or compaction when changing /effort or /model

I'd be extremely interested to know what percentage of these were just fixing last week's Claude Code written PR that no human ever set eyes on.

But hey, all that churn looks great on charts being circulated on social media as free advertising for their flagship product (and consequently the company's valuation) so never mind, LGTM!

grayhatter1mo ago· 1 in thread

> I am an engineer. I hire other engineers. I run a company that ships usable software for small businesses.

> We do this every day. I'm sorry to say, we are indeed shipping in days what used to take weeks.

I've been searching for months for evidence of this kinda thing. Do you have receipts you can share? Or is it more of the same "just trust me bro"?

adamtaylor_13OP1mo ago

I should put together a blog post to share more, but unfortunately it is more "trust me bro" at this stage. You can see a few other comments where I replied: we do have subjective evidence that seems to suggest to me that we're moving much faster than we could've moved in the past.

Of course, it's not just shipping, it's shipping stably in a way that doesn't disrupt the day-to-day operations of the businesses we're working for. One client that comes to mind has 2-3 niche SaaS applications that they used independently for various workloads. We completely replaced 2 of those without any disruptions to their business in about 6 months (no, we did not replace it feature-for-feature; we just built what they needed.)

pron1mo ago

The only way you could possibly know that is if you're reviewing the code, which means you're not "managing fleets of agents". If you're not reviewing the code (and you wouldn't be if you're managing fleets of agents), then you have no way to tell what you're shipping.

2 more replies

globular-toast1mo ago

Does what you ship involve hundreds of lines of HTML/CSS by any chance? Do you care about accessibility?

1 more reply

willio581mo ago

What you are shipping is not the same as what Coinbase is shipping. These are vastly different things. Making a shiny app with AI is great, I'm doing it as I type this. But I am under no delusion that what I make can sustain a multi-million dollar or even billion dollar business in the case of Coinbase. That's plain silly.

1 more reply

mdavid6261mo ago

Shipping garbage.

1 more reply

j / k navigate · click thread line to collapse

0 comments

34 comments · 8 top-level

MeetingsBrowser1mo ago· 19 in thread

As a software engineer who also hires other software engineers, I’m curious about the disconnect in our experiences.

I do systems programming. Before AI feature development roughly went like, design, implement, test, review with some back edges and a lot of time spent in test and review.

AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing, though still an improvement overall.

We do not see the weeks to days improvement though. The bottleneck before was testing and reviewing, and they are even bigger bottlenecks now.

What kind of work do you do, and what kind of workflow were you using before and after AI to benefit so much?

satvikpendem1mo ago

> I do systems programming.

I'll stop you right there. AI is not good at systems programming, it's good at CRUD web development, which is where most people are seeing the gains.

oytis1mo ago

I think antirez mentioned somewhere he considered it particularly good at systems programming.

satvikpendem1mo ago

Depends what it's used for, generally I've seen that due to the paucity of C or Rust etc training data vs Javascript and TypeScript, LLMs aren't as good at the former vs the latter.

dboreham1mo ago

This is a myth in my experience. LLMs are good at all the kinds of programming I've tried using them on, including many cases that are very far from "CRUD web development".

Traubenfuchs1mo ago

>95% of software development is crud.

id1mo ago

It's really not, though. As soon as systems have to scale, regulatory requirements come in, etc. it becomes more complex.

AI has solved simple CRUD, yes, but CRUD, was easy before.

kakacik1mo ago

Now there may be an additional corner case or 20 where its still valid but they are not your typical software engineering work.

stavros1mo ago

rustystump1mo ago

In my experience ai has had far far more bugs than most of what i call senior engineers but far fewer than juniors.

stavros1mo ago

It sounds like we have opposite experiences.

skeptic_ai1mo ago

I never touched kubernetes and in 1 week I have a few nodes running and i understand a lot of it. Not perfect but not bad.

oytis1mo ago

I have recently learned Kubernetes without AI and one week is more than enough to understand most of it.

newphone7331mo ago

This is definitely not true. But I doubt GP understand "most" of kubernetes too. They probably have a good working knowledge of the important commonly used features.

1 more reply

thrawa83873361mo ago

That was the usual experience pre AI

logicchains1mo ago

>AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing,

Maybe they're using AI for testing and reviewing more than you are, not just for coding?

MeetingsBrowser1mo ago

In my experience, the generated code handles the happy path, but isn't great about edge cases or writing clean code, even with explicit instruction in the initial prompt.

We usually end up doing multiple iterations with what claude/codex output, pointing out issues, asking for changes, etc.

logicchains1mo ago

>AI has made the implementation part much faster, at the cost of even more time spent testing and reviewing,

Maybe they're using AI for testing and reviewing more than you are?

adamtaylor_13OP1mo ago

I can't think of any other way to measure the quality of what we're doing. And by those metrics, AI has made us better, not worse.

I should write a blog post to outline more of this in detail.

b0rtb0rt1mo ago

i work on cutting edge c++ system programming and we are using codex for everything now, it’s pretty impressive honestly what it can do

aprilthird20211mo ago· 4 in thread

Give an example.

What's your example?

adamtaylor_13OP1mo ago

From another comment above:

> Obviously it's hard to measure this objectively, but I can't imagine having done this pre-AI with zero downtime and having replaced those SaaS applications in that timeframe.

aprilthird20211mo ago

Yeah that validates my experience. It's best / mostly preferable for ground up rewrites and greenfield work.

I worry we haven't had to maintain vibecoded applications much and have no idea how difficult they will be to debug (or not).

pron1mo ago

aprilthird20211mo ago

maccard1mo ago· 2 in thread

adamtaylor_13OP1mo ago

Obviously it's hard to measure this objectively, but I can't imagine having done this pre-AI with zero downtime and having replaced those SaaS applications in that timeframe.

toraway1mo ago

(Not the exact same chart but similar idea, I guess it's sort of a meme: https://imgur.com/a/YrNGYOR)

So I looked at the most recent CC release notes on Github and the majority look like this:

  Fixed /clear not resetting the terminal tab title after a conversation
  Fixed session title chip from /rename disappearing while a permission or other dialog is active
  Fixed agent panel below the prompt being hidden when subagents are running (regression in 2.1.122)
  Fixed external-editor handoff (Ctrl+G) blanking the conversation history above the prompt
  Fixed /context dumping its rendered ASCII visualization grid into the conversation, wasting ~1.6k tokens per call
  Fixed OAuth refresh race after wake-from-sleep that could log out all running sessions
  Fixed 1-hour prompt cache TTL being silently downgraded to 5 minutes
  Fixed cache-miss warning appearing spuriously after /clear or compaction when changing /effort or /model

I'd be extremely interested to know what percentage of these were just fixing last week's Claude Code written PR that no human ever set eyes on.

But hey, all that churn looks great on charts being circulated on social media as free advertising for their flagship product (and consequently the company's valuation) so never mind, LGTM!

grayhatter1mo ago· 1 in thread

> I am an engineer. I hire other engineers. I run a company that ships usable software for small businesses.

> We do this every day. I'm sorry to say, we are indeed shipping in days what used to take weeks.

I've been searching for months for evidence of this kinda thing. Do you have receipts you can share? Or is it more of the same "just trust me bro"?

adamtaylor_13OP1mo ago

pron1mo ago

2 more replies

globular-toast1mo ago

Does what you ship involve hundreds of lines of HTML/CSS by any chance? Do you care about accessibility?

1 more reply

willio581mo ago

1 more reply

mdavid6261mo ago

Shipping garbage.

1 more reply

j / k navigate · click thread line to collapse