undefined | Better HN

0 pointsgitremote1y ago0 comments

Absolutely not. We'd just do the calculations by hand, which is better than running the 95%-correct calculator and then doing the calculations by hand anyway to verify its output.

0 comments

ToValueFunfetti1y ago

Suppose you work in a field where getting calculations right is critical. Your engineers make mistakes less than .01% of the time, but they do a lot of calculations and each mistake could cost $millions or lives. Double- and triple-checking help a lot, but they're costly. Here's a machine that verifies 95% of calculations, but you'd still have to do 5% of the work. Shall I throw it away?

Unreliable tools have a good deal of utility. That's an example of them helping reduce the problem space, but they also can be useful in situations where having a 95% confidence guess now matters more that a 99.99% confidence one in ten minutes- firing mortars in active combat, say.

There's situations where validation is easier than computation; canonically this is factoring, but even division is much simpler than multiplication. It could very easily save you time to multiply all of the calculator's output by the dividend while performing both a multiplication and a division for the 5% that are wrong.

edit: I submit this comment and click to go the front page and right at the top is Unsure Calculator (no relevance). Sorry, I had to mention this

diputsmonro1y ago

> Here's a machine that verifies 95% of calculations, but you'd still have to do 5% of the work.

The problem is that you don't know which 5% are wrong. The AI is confidently wrong all the time. So the only way to be sure is to double check everything, and at some point its easier to just do it the right way.

Sure, some things don't need to be perfect. But how much do you really want to risk? This company thought a little bit of potential misinformation was acceptable, and so it caused a completely self inflicted PR scandal, pissed off their customer base, and lost them a lot of confidence and revenue. Was that 5% error worth it?

Stories like this are going to keep coming the more we rely on AI to do things humans should be doing.

Someday you'll be affected by the fallout of some system failing because you happen to wind up in the 5% failure gap that some manager thought was acceptable (if that manager even ran a calculation and didn't just blindly trust whatever some other AI system told them) I just hope it's something as trivial as an IDE and not something in your car, your bank, or your hospital. But certainly LLMs will be irresponsibly shoved into all three within the next few years, if it's not there already.

ToValueFunfetti1y ago

>The problem is that you don't know which 5% are wrong

This is not a problem in my unreliable calculator use-cases; are you disputing that or dropping the analogy?

Because I'd love to drop the analogy. You mention IDEs- I routinely use IntelliJ's tab completion, despite it being wrong >>5% of the time. I have to manually verify every suggestion. Sometimes I use it and then edit the final term of a nested object access. Sometimes I use the completion by mistake, clean up with backspace instead of undo, and wind up submitting a PR that adds an unused dependency. I consider it indispensable to my flow anyway. Maybe others turn this off?

You mention hospitals. Hospitals run loads of expensive tests every day with a greater than 5% false positive and false negative rate. Sometimes these results mean a benign patient undergoes invasive further testing. Sometimes a patient with cancer gets told they're fine and sent home. Hospitals continue to run these tests, presumably because having a 20x increase in specificity is helpful to doctors, even if it's unreliable. Or maybe they're just trying to get more money out of us?

Since we're talking LLMs again, it's worth noting that 95% is an underestimate of my hit rate. 4o writes code that works more reliably than my coworker does, and it writes more readable code 100% of the time. My coworker is net positive for the team. His 2% mistake rate is not enough to counter the advantage of having someone there to do the work.

An LLM with a 100% hit rate would be phenomenal. It would save my company my entire salary. A 99% one is way worse; they still have to pay me to use it. But I find a use for the 99% LLM more-or-less every day.

gitremoteOP1y ago

> This is not a problem in my unreliable calculator use-cases; are you disputing that or dropping the analogy?

If you use an unreliable calculator to sum a list of numbers, you then need to use a reliable method to sum the numbers to validate that the unreliable calculator's sum is correct or incorrect.

1 more reply

Tainnor1y ago

> Unreliable tools have a good deal of utility.

This is generally true when you can quantify the unreliability. E.g. random prime number tests with a specific error rate can be combined so that the error rates multiply and become negligible.

I'm not aware that we can quantify the uncertainty coming out of LLM tools reliably.

mrheosuper1y ago

> you'd still have to do 5% of the work

No, you still have to do 100% of the work.

ToValueFunfetti1y ago

You simply do not. You do the math yourself to calculate 2(n) for n in [1, 2, 3, 4] and get [2, 5, 6, 8]. You plug it into your (75% accurate) unreliable calculator and get [3, 4, 6, 8]. You now know that you only need to recheck the first two (50%) of the entries.

throwway1203851y ago

I resent becoming QA/QC for the machine instead of doing the same or better thinking myself.

1 more reply

jimbokun1y ago

> Here's a machine that verifies 95% of calculations

Which 95% did it get right?

j / k navigate · click thread line to collapse

0 comments

ToValueFunfetti1y ago

edit: I submit this comment and click to go the front page and right at the top is Unsure Calculator (no relevance). Sorry, I had to mention this

diputsmonro1y ago

> Here's a machine that verifies 95% of calculations, but you'd still have to do 5% of the work.

Stories like this are going to keep coming the more we rely on AI to do things humans should be doing.

ToValueFunfetti1y ago

>The problem is that you don't know which 5% are wrong

This is not a problem in my unreliable calculator use-cases; are you disputing that or dropping the analogy?

gitremoteOP1y ago

> This is not a problem in my unreliable calculator use-cases; are you disputing that or dropping the analogy?

If you use an unreliable calculator to sum a list of numbers, you then need to use a reliable method to sum the numbers to validate that the unreliable calculator's sum is correct or incorrect.

1 more reply

Tainnor1y ago

> Unreliable tools have a good deal of utility.

This is generally true when you can quantify the unreliability. E.g. random prime number tests with a specific error rate can be combined so that the error rates multiply and become negligible.

I'm not aware that we can quantify the uncertainty coming out of LLM tools reliably.

mrheosuper1y ago

> you'd still have to do 5% of the work

No, you still have to do 100% of the work.

ToValueFunfetti1y ago

throwway1203851y ago

I resent becoming QA/QC for the machine instead of doing the same or better thinking myself.

1 more reply

jimbokun1y ago

> Here's a machine that verifies 95% of calculations

Which 95% did it get right?

j / k navigate · click thread line to collapse