undefined | Better HN

story

0 pointsTeMPOraL2y ago0 comments

They may not be a surface directly encoding the "truth" value, but unless we assume that the training data LLMs are trained on are entirely uncorrelated with the truth, there should be a surface that's close enough.

I don't think the assumption that LLM training data is random with respect to truth value is reasonable - people don't write random text for no reason at all. Even if the current training corpus was too noisy for the "truth surface" to become clear - e.g. because it's full of shitposting and people exchanging their misconceptions about things - a better-curated corpus should do the trick.

Also, I don't see how this idea would invalidate the last couple centuries of Western philosophy. The "truth surface", should it exist, would not be following some innate truth property of statements - it would only be reflecting the fact that the statements used in training were positively correlated with truth.

EDIT: And yes, this would be a huge thing - but not because of some fundamental philosophical reasons, but rather because it would be an effective way to pull truths and correlations from aggregated beliefs of large number of people. It's what humans do when they synthesize information, but at a much larger scale, one we can't match mostly because we don't live long enough.

0 comments

Borealid2y ago

I think this is a misunderstanding of what would be necessary for an LLM to only output truth.

Let's imagine there does exist a function for evaluating truth - it takes in a statement and produces whether that statement is "true" (whatever "true" means). Let's also say it does that perfectly.

We train the LLM. We keep training it, and training it, and training it, and we eventually get a set of weights where our eval runs only make it produce statements where the truth-function says they are truthful.

We deploy the LLM. It's given an input that wasn't part of the evaluation set. We have no guarantee at all that the output will be true. The weights we chose for the LLM during the training process are a serendipitous accident: we observed that they produced truthy output in the scenarios we tested. Scenarios we didn't test _probably_ produce truthy output, but in all likelihood some will not, and we have no mathematical guarantee.

This remains the case even if you have a perfect truth function, and remains true if you use deterministic inference (always the most likely token). Your comment goes even further than that and asserts that a mostly-accurate function is good enough.

TeMPOraLOP2y ago

Science itself has the same problem. There's literally no reason to be certain that the Sun will rise tomorrow, or that physics will make sense tomorrow, or that the universe will not become filled with whipped cream tomorrow. There is no fundamental reason for such inductions to hold - but we've empirically observe they do, and the more they do, the safer we feel in assuming they'll continue to hold.

This assumption is built into science as its fundamental axiom. And then, all the theories and models we develop, also have "no mathematical guarantee" - we just keep using them to predict outcomes of some tests (designed or otherwise), and compare actual outcomes. As long as they remain identical (within tolerance), we remain confident in those theories.

Same will be the case with LLMs. If we train it and then test it by feeding it data from outside of the training set, for which we know the truth value, and the AI determines that truth value correctly - and then keep repeating it many many times, and the AI passes the test most of the times - then we can slowly gain certainty that it has, in fact, learned a lot, and isn't just guessing.