undefined | Better HN

0 pointsbluefirebrand9mo ago0 comments

> We never can have total trust in LLM output, but we can certainly sanitize it and limit it's destructive range

Can we really do this reliably? LLMs are non-deterministic, right, so how do we validate the output in a deterministic way?

We can validate things like shape of data being returned, but how do we validate correctness without an independent human in the loop to verify?

0 comments

olivermuty9mo ago

Put four juniors in separate rooms and give them the same task. Do you expect them to produce identical solutions?

If no? Then congrats, you are now in a position where your software development lifecycle needs to handle non-determinism.

This fanatical vibing movement is ridicolous, but this luddite stance that LLMs cannot contribute to software dev because they are «non deterministic» is almost as ludicrus.

bluefirebrandOP9mo ago

> Then congrats, you are now in a position where your software development lifecycle needs to handle non-determinism.

Sure, except the Juniors producing wrong solutions is the Juniors problem, not mine

If I give four LLM agents tasks and they call come back with slightly wrong solutions, that's me adding four problems to my own workload

I'm not sure how I'm supposed to keep up with that. I'm definitely not sure it makes me overall more productive

lovich9mo ago

The same way we did it with humans in the loop?

I check AI output for hallucinations and issues as I don’t fully trust it to work, but we also do PRs with humans to have another set of eyes check because humans also make mistakes.

For the soft sciences and arts I’m not sure how to validate anything from AI but for software and hard sciences I don’t see why test suites wouldn’t continue serving their same purpose

aDyslecticCrow9mo ago

Famously, "it's easier to write code than to read it". That goes for humans. So why did we automate the easy part and move the effort over to the hard part?

If we need a human in the loop to check every row of code for the deep logic errors... then we could just get the human to write it no?

buescher9mo ago

We’ve been automating the easy parts since the first compiler, but llms make everything weird.

1 more reply

mk_stjames9mo ago

I want to point out that LLMs can be completely deterministic if the final sampler is run with 0 temperature (picking the highest probability token), no top-k, fixed seed, etc.

yencabulator9mo ago

Highest probability token can still vary nondeterministically when the computation is essentially racing GPU cores or even separate hosts against each other. Float math evaluation order can change the end result.

j / k navigate · click thread line to collapse