undefined | Better HN

0 pointsyodsanklai1y ago0 comments

You're not supposed to trust the tool, you're supposed to review and rework the code before submitting for external review.

I use AI for rather complex tasks. It's impressive. It can make a bunch of non-trivial changes to several files, and have the code compile without warnings. But I need to iterate a few times so that the code looks like what I want.

That being said, I also lose time pretty regularly. There's a learning curve, and the tool would be much more useful if it was faster. It takes a few minutes to make changes, and there may be several iterations.

0 comments

ryandrake1y ago

> You're not supposed to trust the tool, you're supposed to review and rework the code before submitting for external review.

It sounds like the guys in this article should not have trusted AI to go fully open loop on their customer support system. That should be well understood by all "customers" of AI. You can't trust it to do anything correctly without human feedback/review and human quality control.

schmichael1y ago

> You're not supposed to trust the tool

This is just an incredible statement. I can't think of another development tool we'd say this about. I'm not saying you're wrong, or that it's wrong to have tools we can't just, just... wow... what a sea change.

ModernMech1y ago

Imagine! Imagine if 0.05% of the time gcc just injected random code into your binaries. Imagine, you swing a hammer and 1% of the time it just phases into the wall. Tools are supposed to be reliable.

arvinsim1y ago

There are no existing AI tools that guarantee correct code 100% of the time.

If there is such a tool, programmers will be on path of immediate reskilling or lose their jobs very quickly.

ryandrake1y ago

Imagine if your compiler just randomly and non-deterministically compiled valid code to incorrect binaries, and the tool's developer couldn't really tell you why it happens, how often it was expected to happen, how severe the problem was expected to be, and told you to just not trust your compiler to create correct machine code.

Imagine if your calculator app randomly and non-deterministically performed arithmetic incorrectly, and you similarly couldn't get correctness expectations from the developer.

Imagine if any of your communication tools randomly and non-deterministically translated your messages into gibberish...

I think we'd all throw away such tools, but we are expected to accept it if it's an "AI tool?"

andrei_says_1y ago

Imagine that you yourself never use these tools directly but your employees do. And the sellers of said tools swear that the tools are amazing and correct and will save you millions.

They keep telling you that any employee who highlights problems with the tools are just trying to save their job.

Your investors tell you that the toolmakers are already saving money for your competitors.

Now, do you want that second house and white lotus vacation or not?

Making good tools is difficult. Bending perception (“is reality”) is easier and enterprise sales, just like good propaganda, work. The gold rush will leave a lot of bodies behind but the shovelmakers will make a killing.

ModernMech1y ago

I feel like there's a lot of motivated reasoning going on, yeah.

arvinsim1y ago

If you think of AI like a compiler, yes we should throw away such tools because we expect correctness and deterministic outcomes

If you think of AI like a programmer, no we shouldn't throw away such tools because we accept them as imperfect and we still need to review.

bigstrat20031y ago

> If you think of AI like a programmer, no we shouldn't throw away such tools because we accept them as imperfect and we still need to review.

This is a common argument but I don't think it holds up. A human learns. If one of my teammates or I make a mistake, when we realize it we learn not to make that mistake in the future. These AI tools don't do that. You could use a model for a year, and it'll be just as unreliable as it is today. The fact that they can't learn makes them a nonstarter compared to humans.

learningstud1y ago

Edsgar Dijkstra!

ToValueFunfetti1y ago

If the only calculators that existed failed at 5% of the calculations, or if the only communication tools miscommunicated 5% of the time, we would still use both all the time. They would be far less than 95% as useful as perfect versions, but drastically better then not having the tools at all.

gitremote1y ago

Absolutely not. We'd just do the calculations by hand, which is better than running the 95%-correct calculator and then doing the calculations by hand anyway to verify its output.

1 more reply

tevon1y ago

Stackoverflow is like this, you read an answer but are not fully sure if its right or if it fits your needs.

Of course there is a review system for a reason, but we frequently use "untrusted" tools in development.

That one guy in a github issue that said "this worked for me"

shipp021y ago

In Mechanical Engineering, this is 100% a thing with fluid dynamics simulation. You need to know if the output is BS based on a number of factors that I don't understand.

theonething1y ago

> I can't think of another development tool we'd say this about.

Because no other dev tool actually generates unique code like AI does. So you treat it like the other components of your team that generates code, the other developers. Do you trust other developers to write good code without mistakes without getting it reviewed by others. Of course not.

seabird1y ago

Yes, actually, I do! I trust my teammates with tens of thousands of hours of experience in programming, embedded hardware, our problem spaces, etc. to write from a fully formed worldview, and for their code to work as intended (as far as anybody can tell before it enters preliminary testing by users) by the time the rest of the team reviews it. Most code review is uneventful. Have some pride in your work and you'll be amazed at what's possible.

theonething1y ago

so your saying that yes you do "trust other developers to write good code without mistakes without getting it reviewed by others."

And then you say "by the time the rest of the team reviews it. Most code review is uneventful."

So you trust your team to develop without the need for code review but yet, your team does code review.

So what is the purpose of these code reviews? Is it the case that you actually don't think they are necessary, but perhaps management insists on them? You actually answer this question yourself:

> Most code review is uneventful.

Keyword here is "most" as opposed to "all" So based your team's applied practices and your own words, code review is for the purpose of catching mistakes and other needed corrections.

But it seems to me if you trust your team not to make mistakes, code review is superfluous.

As an aside, it seems your team culture doesn't make room for juniors because if your team had juniors I think it would be even more foolish to trust them not to make mistakes. Maybe a junior free culture works for your company, but that's not the case for every company.

My main point is code review is not superfluous no matter the skill level; junior, senior, or AI simply because everyone and every AI makes mistakes. So I don't trust those three classes of code emitters to not ever make mistakes or bad choices (i.e. be perfect) and therefore I think code review is useful.

Have some honesty and humility and you'll amazed at what's possible.

1 more reply

anonymars1y ago

I trust my colleagues to write code that compiles, at the very least

ModernMech1y ago

Oh at the very least I trust them to not take code that compiles and immediately assess that it's broken.

chrisweekly1y ago

But of course everyone absolutely NEEDS to use AI for codereviews! How else could the huge volume of AI-generated code be managed?

forgetfreeman1y ago

"Do you trust other developers to write good code without mistakes without getting it reviewed by others."

Literally yes. Test coverage and QA to catch bugs sure but needing everything manually reviewed by someone else sounds like working in a sweatshop full of intern-level code bootcamp graduates, or if you prefer an absolute dumpster fire of incompetence.

ryandrake1y ago

I would accept mistakes and inconsistency from a human, especially one not very experienced or skilled. But I expect perfection and consistency from a machine. When I command my computer to do something, I expect it to do it correctly, the same way every time, to convert a particular input to an exact particular output, every time. I don't expect it to guess, or randomly insert garbage, or behave non-deterministically. Those things are called defects(bugs) and I'd want them to be fixed.

3 more replies

theonething1y ago

Ok, here I thought requiring PR review and approval before merging was standard industry best practice. I guess all the places I've worked have been doing it wrong?

1 more reply

gtirloni1y ago

1) Once you get it to output something you like, do you check all the lines it changed? Is there a threshold after which you just... hope?

2) No matter what the learning curve, you're using a statistical tool that outputs in probabilities. If that's fine for your workflow/company, go for it. It's just not what a lot of developers are okay with.

Of course it's a spectrum with the AI deniers in one corner and the vibe coders in the other. I personally won't be relying 100% on a tool and letting my own critical thinking atrophy, which seems to be happening, considering recent studies posted here.

nkoren1y ago

I've been doing AI-assisted coding for several months now, and have found a good balance that works for me. I'm working in Typescript and React, neither of which I know particularly well (although I know ES6 very well). In most cases, AI is excellent at tasks which involve writing quasi-custom boilerplate (eg. tests which require a lot of mocking), and at answering questions of how I should do _X_ in TS/React. For the latter, those are undoubtedly questions I could eventually find the answers on Stack Overflow and deduce how to apply those answers to my specific context -- but it's orders of magnitude faster to get the AI to do that for me.

Where the AI fails is in doing anything which requires having a model of the world. I'm writing a simulator which involves agents moving through an environment. A small change in agent behaviour may take many steps of the simulator to produce consequential effects, and thinking through how that happens -- or the reverse: reasoning about the possible upstream causes of some emergent macroscopic behaviour -- requires a mental model of the simulation process, and AI absolutely does _not_ have that. It doesn't know that it doesn't have that, and will therefore hallucinate wildly as it grasps at an answer. Sometimes those hallucinations will even hit the mark. But on the whole, if a mental model is required to arrive at the answer, AI wastes more time than it saves.

jimbokun1y ago

> AI is excellent at tasks which involve writing quasi-custom boilerplate (eg. tests which require a lot of mocking)

I wonder if anyone has compared how well the AI auto-generating approach works compared to meta programming approaches (like Lisp macros) meant to address the same kind of issues with repetitive code.

kazinator1y ago

The generation of volumes of boiler plate takes effort; nobody likes to do it.

The problem is, that phase is not the full life cycle of the boiler plate.

You have to live with it afterward.

pjerem1y ago

> 1) Once you get it to output something you like, do you check all the lines it changed? Is there a threshold after which you just... hope?

Not op but yes. It sometimes takes a lot of time but I read everything. It still faster than nothing. Also, I ask very precise changes to the AI so it doesn’t generate huge diffs anyway.

Also for new code, TDD works wonders with AI : let it write the unit tests (you still have to be mindful of what you want to implement) and ask it to implement the code that run the tests. Since you talk the probabilistic output, the tool is incredibly good at iterating over things (running and checking tests) and also, unit tests are, in themselves, a pretty perfect prompt.

iforgotpassword1y ago

> It sometimes takes a lot of time but I read everything. It still faster than nothing.

Opposite experience for me. It reliably fails at more involved tasks so that I don't even try anymore. Smaller tasks that are around a hundred lines maybe take me longer to review that I can just do it myself, even though it's mundane and boring.

The only time I found it useful is if I'm unfamiliar with a language or framework, where I'd have to spend a lot of time looking up how to do stuff, understand class structures etc. Then I just ask the AI and have to slowly step through everything anyways, but at least there's all the classes and methods that are relevant to my goal and I get to learn along the way.

riffraff1y ago

How do you have it write tests before the code? It seems writing a prompt for the LLM to generate the tests would take the same time as writing the tests themselves.

Unless you're thinking of repetitive code I can't imagine the process (I'm not arguing, I'm just curious of what you're flow looks like).

yodsanklaiOP1y ago

> Is there a threshold after which you just... hope?

Generally, all the code I write is reviewed by humans, so commits need to be small and easily reviewable. I can't submit something I don't understand myself or I may piss off my colleagues, or it may never get reviewed.

Now if it was a personal project or something with low value, I would probably be more lenient but I think if you use a statically typed language, the type system + unit tests can capture a lot of issues so it may be ok to have local blocks that you don't look in details.

ModernMech1y ago

Yeah for me, I use AI with Rust and a suite of 1000 tests in my codebase. I also use CoPilot VS code plugin mostly, which as far as I can tell heavily weights toward local code around it and often it just writing code based on my other code. I've found AI to be a good macro debugger too, as macro debugging tools are severely lacking in most ecosystems.

But when I see people using these AI tools to write JavaScript of Python code wholesale from scratch, that's a huge question mark for me. Because how?? How are you sure that this thing works? How are you sure when you update it won't break? Indeed the answer seems to be "We don't know why it works, we can't tell you under which conditions it will break, we can't give you any performance guarantees because we didn't test or design for those, we can't give you any security guarantees because we don't know what security is and why that's important."

People forgot we're out here trying to do software engineering, not software generation. Eternal September is upon us.

senordevnyc1y ago

1) Yes, I review every line it changed.

2) I find the tool analogy helpful but it has limits. Yes, it’s a stochastic tool, but in that sense it’s more like another mind, not a tool. And this mind is neither junior nor senior, but rather a savant.

bigstrat20031y ago

> You're not supposed to trust the tool, you're supposed to review and rework the code before submitting for external review.

Then it's not a useful tool, and I will decline to waste time on it.

jorvi1y ago

> But I need to iterate a few times so that the code looks like what I want.

The LLM too. You can get a pretty big improvement by telling the LLM to "iterate 4 times on whichever code I want you to generate, but only show me the final iteration, and then continue as expected".

I personally just inject the request for 4 iterations into the system prompt.

mrheosuper1y ago

If i dont trust my tool, i would never use it, or use something else better

e3bc54b21y ago

> You're not supposed to trust the tool, you're supposed to review and rework the code before submitting for external review.

The vibe coding guy said to forget the code exists and give in to vibes, letting the AI 'take care' of things. Review and rework sounds more like 'work' and less like 'vibe'.

j / k navigate · click thread line to collapse

0 comments

ryandrake1y ago

> You're not supposed to trust the tool, you're supposed to review and rework the code before submitting for external review.

schmichael1y ago

> You're not supposed to trust the tool

ModernMech1y ago

arvinsim1y ago

There are no existing AI tools that guarantee correct code 100% of the time.

If there is such a tool, programmers will be on path of immediate reskilling or lose their jobs very quickly.

ryandrake1y ago

Imagine if your calculator app randomly and non-deterministically performed arithmetic incorrectly, and you similarly couldn't get correctness expectations from the developer.

Imagine if any of your communication tools randomly and non-deterministically translated your messages into gibberish...

I think we'd all throw away such tools, but we are expected to accept it if it's an "AI tool?"

andrei_says_1y ago

Imagine that you yourself never use these tools directly but your employees do. And the sellers of said tools swear that the tools are amazing and correct and will save you millions.

They keep telling you that any employee who highlights problems with the tools are just trying to save their job.

Your investors tell you that the toolmakers are already saving money for your competitors.

Now, do you want that second house and white lotus vacation or not?

ModernMech1y ago

I feel like there's a lot of motivated reasoning going on, yeah.

arvinsim1y ago

If you think of AI like a compiler, yes we should throw away such tools because we expect correctness and deterministic outcomes

If you think of AI like a programmer, no we shouldn't throw away such tools because we accept them as imperfect and we still need to review.

bigstrat20031y ago

> If you think of AI like a programmer, no we shouldn't throw away such tools because we accept them as imperfect and we still need to review.

learningstud1y ago

Edsgar Dijkstra!

ToValueFunfetti1y ago

gitremote1y ago

Absolutely not. We'd just do the calculations by hand, which is better than running the 95%-correct calculator and then doing the calculations by hand anyway to verify its output.

1 more reply

tevon1y ago

Stackoverflow is like this, you read an answer but are not fully sure if its right or if it fits your needs.

Of course there is a review system for a reason, but we frequently use "untrusted" tools in development.

That one guy in a github issue that said "this worked for me"

shipp021y ago

In Mechanical Engineering, this is 100% a thing with fluid dynamics simulation. You need to know if the output is BS based on a number of factors that I don't understand.

theonething1y ago

> I can't think of another development tool we'd say this about.

seabird1y ago

theonething1y ago

so your saying that yes you do "trust other developers to write good code without mistakes without getting it reviewed by others."

And then you say "by the time the rest of the team reviews it. Most code review is uneventful."

So you trust your team to develop without the need for code review but yet, your team does code review.

So what is the purpose of these code reviews? Is it the case that you actually don't think they are necessary, but perhaps management insists on them? You actually answer this question yourself:

> Most code review is uneventful.

Keyword here is "most" as opposed to "all" So based your team's applied practices and your own words, code review is for the purpose of catching mistakes and other needed corrections.

But it seems to me if you trust your team not to make mistakes, code review is superfluous.

Have some honesty and humility and you'll amazed at what's possible.

1 more reply

anonymars1y ago

I trust my colleagues to write code that compiles, at the very least

ModernMech1y ago

Oh at the very least I trust them to not take code that compiles and immediately assess that it's broken.

chrisweekly1y ago

But of course everyone absolutely NEEDS to use AI for codereviews! How else could the huge volume of AI-generated code be managed?

forgetfreeman1y ago

"Do you trust other developers to write good code without mistakes without getting it reviewed by others."

ryandrake1y ago

3 more replies

theonething1y ago

Ok, here I thought requiring PR review and approval before merging was standard industry best practice. I guess all the places I've worked have been doing it wrong?

1 more reply

gtirloni1y ago

1) Once you get it to output something you like, do you check all the lines it changed? Is there a threshold after which you just... hope?

nkoren1y ago

jimbokun1y ago

> AI is excellent at tasks which involve writing quasi-custom boilerplate (eg. tests which require a lot of mocking)

kazinator1y ago

The generation of volumes of boiler plate takes effort; nobody likes to do it.

The problem is, that phase is not the full life cycle of the boiler plate.

You have to live with it afterward.

pjerem1y ago

> 1) Once you get it to output something you like, do you check all the lines it changed? Is there a threshold after which you just... hope?

Not op but yes. It sometimes takes a lot of time but I read everything. It still faster than nothing. Also, I ask very precise changes to the AI so it doesn’t generate huge diffs anyway.

iforgotpassword1y ago

> It sometimes takes a lot of time but I read everything. It still faster than nothing.

riffraff1y ago

How do you have it write tests before the code? It seems writing a prompt for the LLM to generate the tests would take the same time as writing the tests themselves.

Unless you're thinking of repetitive code I can't imagine the process (I'm not arguing, I'm just curious of what you're flow looks like).

yodsanklaiOP1y ago

> Is there a threshold after which you just... hope?

ModernMech1y ago

People forgot we're out here trying to do software engineering, not software generation. Eternal September is upon us.

senordevnyc1y ago

1) Yes, I review every line it changed.

bigstrat20031y ago

> You're not supposed to trust the tool, you're supposed to review and rework the code before submitting for external review.

Then it's not a useful tool, and I will decline to waste time on it.

jorvi1y ago

> But I need to iterate a few times so that the code looks like what I want.

I personally just inject the request for 4 iterations into the system prompt.

mrheosuper1y ago

If i dont trust my tool, i would never use it, or use something else better

e3bc54b21y ago

> You're not supposed to trust the tool, you're supposed to review and rework the code before submitting for external review.

The vibe coding guy said to forget the code exists and give in to vibes, letting the AI 'take care' of things. Review and rework sounds more like 'work' and less like 'vibe'.

j / k navigate · click thread line to collapse