undefined | Better HN

0 pointszrm1y ago0 comments

LLMs are essentially predicting what token could plausibly come next. They sort the possible next tokens by probability, often throw out the the ones with very low probabilities (p-value too low) and then use a weighted random number generator to choose one from the rest.

Sometimes that means you exclude a good next token, or include a bad one, so a bad one gets chosen. And then once it has, the thing is going to pick whatever is most likely to come after that, which will be some malarkey because it has already emitted a nonsense token and is now using that as context. But whatever it is, it will still sound plausible because it's still choosing from the most likely things to follow what has already been emitted.

0 comments

3 comments · 1 top-level

seanhunter1y ago· 2 in thread

A p-value isn’t just any old probability- it has a specific meaning related to hypothesis testing[1]. A p-value is the conditional probability of seeing a result at least as extreme as some observation under the null hypothesis.

Yes LLMs generate tokens using a stochastic process, so it is probabalistic. Everyone knows that, but in the normal process of generating text, LLMs aren’t doing a hypothesis test so by definition p-values are completely irrelevant to how LLMs hallucinate.

[1] https://math.libretexts.org/Courses/Queens_College/Introduct...

seanhunter1y ago

This is such a common thing to misunderstand that I'm going to respond to my own message to give an explanation, because many of the links I've found make sense once you know the lingo etc but might not before then.

Say you go into a bar and just by chance there is a football[1] match on television between Denmark and France. You see a bunch of fans of each country and you think "Hey, the Danes look taller than the French". You want to find out whether this is true in general, so to test this hypothesis you persuade them during a lull in the match to line up and get measured. As luck would have it there are exactly n people from each country.

H_0 (the null hypothesis) is that the two population means are the same. That is, that Danish people have the same average height as French people.

H_1 (the alternative hypothesis) is that Danish people are taller on average than French people (ie the population mean is larger).

So you take the average height and see that the Danes in this bar are say 5cm taller on average than the French people in this bar.

The p-value is how likely it would be to select a random sample of n people from each of two populations (one from Danes, one from French people) with an average height of the Danes in the sample being at least 5cm larger than the French if the actual average height of the underlying populations you sampled from (all Danes and all French people) were the same.

How you use a p-value is typically if it is smaller than some threshold called a critical value you "reject the null hypothesis" at some significance level. So in this case if the p-value was small enough you conclude that the population means are unlikely to be the same.

Actually calculating the p-value is going to depend a bit on the distribution etc but that's what a p-value is. As you can see it's not just a probability.

[1]soccer if you're from the US

zrmOP1y ago

I was vaguely aware of this and I don't know if I want to say I was just being careless or if that was the point because the original comment was kind of a mess and I was defending it in jest.

j / k navigate · click thread line to collapse