undefined | Better HN

0 pointsToValueFunfetti18d ago0 comments

I just don't understand where your position is coming from here. You can't distinguish between a machine that says "here, look at these 170 results, 10% of them are highly serious problems that you should address, you should have some people look into that" and one that shrugs and says "I dunno, maybe just double check everything"? I assume you've come to this conclusion based on some reasoning, but you're not sharing it in this response AFAICT.

0 comments

2 comments · 1 top-level

troupo18d ago· 1 in thread

> You can't distinguish between a machine that says "here, look at these 170 results, 10% of them are highly serious problems that you should address

The machine doesn't say that. It says "Here are 170 completely correct and verified results".

You have to check and verify all of those results yourself, and on any given day it can be anywhere from 0% to 100% incorrect.

> I assume you've come to this conclusion based on some reasoning, but you're not sharing it in this response AFAICT.

The reasoning comes from actually working with AI tools. And the reasoning can be seen in the actual comment this tgread started from: https://news.ycombinator.com/item?id=48434824

ToValueFunfettiOP18d ago

We're assuming per earlier hypothetical that it has a 10% correctness rate. You had said

>In a regulated industry 90% false positive rate is indistinguishable from 100% failure rate

So defending that position on the basis of it not actually being a 90% failure rate would mean you shouldn't have taken it in the first place. The fact that the LLM lies about its failure rate is nearly irrelevant; the machine could output the literal string "The following is 90% likely to be a false positive: " followed by the LLM output.

The reasoning in the comment that started the thread is "it's annoying to redo human review". Your position as I understand it is that there is no or negative business value to a tool that spit out a list of potential issues of which 10% are real issues with your business. This is what I fail to understand. This would be an incredibly useful first step towards any audit and would save loads of money. Why not?

j / k navigate · click thread line to collapse