How LLMs Work, Explained Without Math (opens in new tab)

(blog.miguelgrinberg.com)

215 pointskdamica2y ago91 comments

91 comments

45 comments · 8 top-level

mjburgess2y ago· 17 in thread

If commenters wish to know what is not, "guessing the next word", let me outline it.

Compare, "I like what you were wearing", "Pass me the salt", and "Have you been to London recently?" as generated by an LLM and as spoken by a person.

What is the reason each piece of text (in a whatsapp chat, say) is provided?

When the LLM generates each word it does so because it is, on average, the most common word in a corpus of text on which it was trained: "wearing" follows, "I like what you were" because most people who were having these conversations, captured in the training data, were talking about clothes.

When a person types those words on a keyboard, the following are the causes: the speaker's mental states of recollection, preference, taste; the speaker's affective/attachement states with respect to their friend; the speaker's habitation into social cues; the speaker's imagining through recall what their friend was wearing; the speaker's ability to abstract from their memories into identifying clothing; and so on.

Indeed, the cause of a person speaking is so vastly different to generating a word based on a historical frequency, that to suppose these are related seems incomprehensible.

The only reason the illusion of similarity is effective is because the training data is a text-based observation of the causal process in people: the training data is distributed by people talking (and so on). Insofar as you cannot just replay variations on these prior conversations, the LLM will fail and expose itself as actually insensitive to any of these things.

I'd encourage credulous fans of AI not to dehumanize themselves and others by the supposition that they speak because they are selecting an optimal word from a dictionary based on all prior conversations they were a part of. You aren't doing that.

tiborsaas2y ago

If your point is LLM-s are different than humans, I guess that goes without saying? We are not even on the same level yet.

> the most common word in a corpus of text on which it was trained

I think you are downplaying the fine grains of knowledge that can be encoded in a huge corpus of text. LLM-s are capable of taking context into account and encoding that too, not simply how often word A comes after B.

When I'm in a conversation I'm also selecting the optimal word from a predefined dictionary. That's precisely what's like speaking in a given language. Sure, I'm thinking a bit ahead and I can tap into my memory, feelings and experiences which influences everything.

But the optimal part is derived from context for me, it changes which word I use when I talk to a colleague, family or friend, but I might want to say the same thing. For stock LLM-s everything must be defined in the prompt if we are talking about zero-shot inference.

These models are opening good insight on how language works and I don't find that too dehumanising. There's plenty of room exists still for me to be human and do non AI things.

I get the notion that if we understand fully how something works the magic is gone, this always happens to AI. Are we afraid that this might happen to us too?

rhdunn2y ago

I've experienced LLMs forgetting details (no memory). This is expecially a problem when the information is out of the context window, but I've seen it in other cases as well.

I've experienced LLMs lacking spatial awareness, such as switching locations in a description despite no indication of moving to the new location. The same applies to other concepts that have a visual/spatial component.

I've also experienced LLMs struggling to get subtext, some metaphors, etc., especially when used in casual conversations instead of as a question/answer style prompt.

LLMs are great, but need more work to fix these gaps.