undefined | Better HN

0 pointsninetyninenine7mo ago0 comments

No there is no confounding. When you hallucinate with schizophrenia you know things that are not true and you sense things that are not true. The hallucinations involve both sensory and knowledge.

A weak conceptual model of the world is the problem. But realize humans also have a weak conceptual model of the world as well and make a bunch of hallucinations based on that weak model. For example many people are still making the claim about LLMs that it’s all stochastic parroting when it’s been proven that it’s not. That is an hallucination. Or the people betting (and not) on the financial success of crypto or AI. We don’t know how either of these things will pan out but people on either team act as if they know definitively. A huge part of human behavior is driven by hallucinations that fill in gaps.

> And mathematically speaking, how would you accomplish this? As you probably know LLMs don't operate on conceptual ideas, they operate on tokens. That's why LLMs tend to fail when asked to do things that aren't well represented in their training data, they don't have a working model of the world even if they can fake it to a certain degree.

It’s not an incorrect model of the world as technically both you and an LLM ultimately have an incorrect model of the world and both you and the LLM fake it. The best you can say is that the LLM has a less accurate approximation of the world than you but ultimately both you and the LLM hold an incorrect model and both you and the LLM regularly hallucinate off of it. You also make up bullshit on things not well represented in your own model.

But like I said we are often (and often not) aware of our own bullshit so providing that to the LLM quantitatively will help it too.

The LLM is not just trained on random tokens it’s trained on highly specific groups of tokens and those groups of represent conceptual ideas. So an LLM is 100 percent trained on concepts and tokens are only an encoding of that concept.

If a group of tokens represents a vector then we can for sure calculate distance between vectors. We know that there are also different types of vectors represented at each layer of the feed forward network that encode reasoning and not just the syntactic order of the tokens.

Like literally there is not very much training data of a human giving instructions to someone to write code and the associated code diff. The fact that an LLM can do this to a useable degree without volumes of similar training data speaks to the fact it knows concepts. This is the same tired argument that has been proven wrong. We already know LLMs aren’t just parroting training data as the majority of the agentic coding operations we currently use LLMs for actually don’t have associated training data to copy.

Given that we know all of these embeddings from the training data (the model had to calculate the embeddings at one point) we can encode proximity and distance into the model via addition and subtraction of the magnitude of vectors and from this we extract a number that ascertains distance between vectors embeddings.

Imagine a best fit 2D curve through a scatter plot of data points. But at the same time that curve has a gradient color along it. Red indicates its very close to existing data points blue indicates its far. We can definitely derive and algorithm that calculates the additional “self awareness” dimension here encoded in color and this can extend to the higher dimensional encoding that is the LLM.

If an LLm is aware that the output is red or blue then it can sort of tell that if the line is blue it’s likely to be an hallucination.

0 comments

dns_snek7mo ago

> It’s not an incorrect model of the world as technically both you and an LLM ultimately have an incorrect model of the world and both you and the LLM fake it.

I should've said that the model is "missing", not "weak" when talking about LLMs, that was my mistake. Yes I'm a human with an imperfect and in many aspects incorrect conceptual model of the world, that is true. The following aren't real examples, they're hyperbolic to better illustrate the category of errors I'm talking about.

If someone asks me "can I stare into the sun without eye protection", my answer isn't going to change based on how the question is phrased because I conceptually understand that the radiation coming from the sun (and more broadly, intense visible radiation emitted from any source) causes irreversible damage to your eyes, which is a fact stored in my conceptual understanding of the world.

However LLMs will flip flop based on tone and phrasing of your question. Asked normally, they will warn you about the dangers of staring into the sun, but if your question hints at disbelief, they might reply "No you're right, staring into the sun isn't that bad".

I also know that mirrors reflect light, which allows me to intuitively understand that staring at the sun through a mirror is dangerous without being explicitly taught that fact.

If you ask an LLM whether staring into a mirror which is pointed at the sun (oriented such that you see the sun through the mirror) is safe, they might agree that it's safe to do so, even though they "know" that staring into the sun is dangerous, and they "know" that mirrors reflect light. Presumably this is because their training data doesn't explicitly state that staring at a mirror is dangerous.

The way the question is framed can completely change their answer which betrays their lack of conceptual understanding. Those are distinctly different problems. You might say that humans do this too, but we don't call that intelligent behavior, and we tend to have a low opinion of those who exhibit this behavior often.

ninetyninenineOP7mo ago

No it doesn’t. Conceptual understanding is there. But the LLM is not obligated towards correctness. The fact that at one point it gave you the correct answer is indicative that an aspect of it understands the concept.

Like if I told it solve a complex puzzle equation not in its training data and it correctly solved that problem. We know from the low probability of arriving at that solution from random chance that the LLM must know and understand and reason to arrive at that solution.

Now you’re saying you perturb the input with some grammar changes but leave everything else the same and the LLM will now produce a wrong answer. But this doesn’t change the fact that it was able to get the right answer.

Humans can be dumb and inconsistent. LLMs can be dumb and inconsistent too. This happens to be a quirk of the LLM. But you cannot deny that it is intelligent on the sole fact that LLMs can produce output that we know for sure can only be arrived at through reasoning.

dns_snek7mo ago

> The fact that at one point it gave you the correct answer is indicative that an aspect of it understands the concept.

Having a conceptual understanding means that you always provide the same answer to a conceptually equivalent question. Producing the wrong answer when a question is rephrased is indicative of rote memorization.

The fact that it provided the right answer at one point is only indicative of memorization, not understanding which is precisely the difference between sometimes getting it right and always getting it right.

1 more reply

j / k navigate · click thread line to collapse

0 pointsninetyninenine7mo ago0 comments

No there is no confounding. When you hallucinate with schizophrenia you know things that are not true and you sense things that are not true. The hallucinations involve both sensory and knowledge.

But like I said we are often (and often not) aware of our own bullshit so providing that to the LLM quantitatively will help it too.

If an LLm is aware that the output is red or blue then it can sort of tell that if the line is blue it’s likely to be an hallucination.

0 comments

dns_snek7mo ago

> It’s not an incorrect model of the world as technically both you and an LLM ultimately have an incorrect model of the world and both you and the LLM fake it.

I also know that mirrors reflect light, which allows me to intuitively understand that staring at the sun through a mirror is dangerous without being explicitly taught that fact.

ninetyninenineOP7mo ago

dns_snek7mo ago

> The fact that at one point it gave you the correct answer is indicative that an aspect of it understands the concept.

1 more reply

j / k navigate · click thread line to collapse