undefined | Better HN

0 pointsslopinthebag4mo ago0 comments

I can't tell if this is sarcasm, but if not, you cant rely on the thing that produced invalid output to validate it's own output. That is fundementally insufficient, despite it potentially catching some errors.

0 comments

20 comments · 7 top-level

_2d304mo ago· 9 in thread

Damn. Guess I'll stop QAing my own work from now.

lioeters4mo ago

This but unironically. Of course review your own work. But QA is best done by people other than those who develop the product. Having another set of eyes to check your work is as old as science.

ryan_n4mo ago

That is often how software development has been done the past several decades yea...

Not to say that you don't review your own work, but it's good practice for others (or at least one other person) to review it/QA it as well.

slopinthebagOP4mo ago

You're making a false equivalence between a human being with agency and intelligence, and a machine.

rhubarbtree4mo ago

Are humans not machines?

2 more replies

alistairSH4mo ago

Yes. That’s not a best practice. That’s why PRs and peer reviews and test automation suite exist.

akdev1l4mo ago

I think it is common for one to write their own tests tho

1 more reply

Spivak4mo ago

I mean there is some wisdom to that, most teams separate dev and qa and writers aren't their own editors precisely because it's hard for the author of a thing to spot their own mistakes.

When you merge them into one it's usually a cost saving measure accepting that quality control will take a hit.

habinero4mo ago

Yeah, someone should invent code review.

meheleventyone4mo ago

Uh, yeah, thh hi is has been considered bad practice for decades.

latchkey4mo ago· 2 in thread

> you cant rely on the thing that produced invalid output to validate it's own output

I've been coding an app with the help of AI. At first it created some pretty awful unit tests and then over time, as more tests were created, it got better and better at creating tests. What I noticed was that AI would use the context from the tests to create valid output. When I'd find bugs it created, and have AI fix the bugs (with more tests), it would then do it the right way. So it actually was validating the invalid output because it could rely on other behaviors in the tests to find its own issues.

The project is now at the point that I've pretty much stopped writing the tests myself. I'm sure it isn't perfect, but it feels pretty comprehensive at 693 tests. Feel free to look at the code yourself [0].

[0] https://github.com/OrangeJuiceExtension/OrangeJuice/actions/...

slopinthebagOP4mo ago

I'm not saying you can't do it, I'm just saying it's not sufficient on its own. I run my code through an LLM and it occasionally catches stuff I missed.

latchkey4mo ago

Thanks for the clarification. That's the difference though, I don't need it to catch stuff I missed, I catch stuff it misses and I tell it to add it, which it dutifully does.

iagooar4mo ago· 2 in thread

What if "the thing" is a human and another human validating the output. Is that its own output (= that of a human) or not? Doesn't this apply to LLMs - you do not review the code within the same session that you used to generate the code?

slopinthebagOP4mo ago

I think a human and an LLM are fundamentally different things, so no. Otherwise you could make the argument that only something extra-terrestrial could validate our work, since LLM's like all machines are also our outputs.

koolba4mo ago

The problem now is that it’s a human using Claude to write the code and another using Claude to review it.

huslage4mo ago

I have had other LLMs QA the work of Claude Code and they find bugs. It's a good cycle, but the bugs almost never get fixed in one-shot without causing chaos in the codebase or vast swaths of rewritten code for no reason.

charcircuit4mo ago

Products don't have to be perfect. If they can be less buggy than before AI. You can't call that anything but a win.

CamperBob24mo ago

I can't tell if that is sarcasm. Of course you can use the same model to write tests. That's a different problem altogether, with a different series of prompts altogether!

When it comes to code review, though, it can be a good idea to pit multiple models against each other. I've relied on that trick from day 1.

Nition4mo ago

That's why you get Codex to do it. /s

j / k navigate · click thread line to collapse

0 comments

20 comments · 7 top-level

_2d304mo ago· 9 in thread

Damn. Guess I'll stop QAing my own work from now.

lioeters4mo ago

This but unironically. Of course review your own work. But QA is best done by people other than those who develop the product. Having another set of eyes to check your work is as old as science.

ryan_n4mo ago

That is often how software development has been done the past several decades yea...

Not to say that you don't review your own work, but it's good practice for others (or at least one other person) to review it/QA it as well.

slopinthebagOP4mo ago

You're making a false equivalence between a human being with agency and intelligence, and a machine.

rhubarbtree4mo ago

Are humans not machines?

2 more replies

alistairSH4mo ago

Yes. That’s not a best practice. That’s why PRs and peer reviews and test automation suite exist.

akdev1l4mo ago

I think it is common for one to write their own tests tho

1 more reply

Spivak4mo ago

I mean there is some wisdom to that, most teams separate dev and qa and writers aren't their own editors precisely because it's hard for the author of a thing to spot their own mistakes.

When you merge them into one it's usually a cost saving measure accepting that quality control will take a hit.

habinero4mo ago

Yeah, someone should invent code review.

meheleventyone4mo ago

Uh, yeah, thh hi is has been considered bad practice for decades.

latchkey4mo ago· 2 in thread

> you cant rely on the thing that produced invalid output to validate it's own output

[0] https://github.com/OrangeJuiceExtension/OrangeJuice/actions/...

slopinthebagOP4mo ago

I'm not saying you can't do it, I'm just saying it's not sufficient on its own. I run my code through an LLM and it occasionally catches stuff I missed.

latchkey4mo ago

Thanks for the clarification. That's the difference though, I don't need it to catch stuff I missed, I catch stuff it misses and I tell it to add it, which it dutifully does.

iagooar4mo ago· 2 in thread

slopinthebagOP4mo ago

koolba4mo ago

The problem now is that it’s a human using Claude to write the code and another using Claude to review it.

huslage4mo ago

charcircuit4mo ago

Products don't have to be perfect. If they can be less buggy than before AI. You can't call that anything but a win.

CamperBob24mo ago

I can't tell if that is sarcasm. Of course you can use the same model to write tests. That's a different problem altogether, with a different series of prompts altogether!

When it comes to code review, though, it can be a good idea to pit multiple models against each other. I've relied on that trick from day 1.

Nition4mo ago

That's why you get Codex to do it. /s

j / k navigate · click thread line to collapse