undefined | Better HN

0 pointsvharuck2y ago0 comments

>I have yet to see a definition of "statistically guessing next word" that cannot be applied to what a human brain does to generate the next word.

Here's one. Given a conversation history made of n sequential tokens S1, S2, ..., Sn, an LLM will generate the next token using an insanely complicated model we'll just call F:

    S(n+1) = F(S1, S2, ..., Sn)

As for me, I'll often think of my next point, figure out how to say that concept, and then figure out the right words to connect it where the conversation's at right then. So there's one function, G, for me to think of the next conversational point. And then another, H, to lead into it.

    S(n+100) = G(S1, S2, ..., Sn)
    S(n+1) = G(S1, S2, ..., Sn, S(n+100))

And this is putting aside how people don't actually think in tokens. And some people don't always have an internal monologue (I rarely do when doing math).

0 comments

jameshart2y ago

That isn’t incompatible with what LLMs do though.

The penultimate layer of the LLM could be thought of as the one that figures out ‘given S1..Sn, what concept am I trying to express now?’. The final layer is the function from that to ‘what token should I output next’.

The fact that the LLM has to figure that all out again from scratch as part of generating every token, rather than maintaining a persistent ‘plan’, doesn’t make the essence of what it’s doing any different from what you claim you’re doing.

iandanforth2y ago

Correct, but it's functionally very different from how LLMs are implemented and deployed today. What you're highlighting is being experimented with and ties into ideas like scratch pads, world models, RAG, and progressive fine-tuning (if you're googling).

It's a bit like saying your computer has everything it needs to manipulate photos but doesn't yet have Photoshop installed.

jameshart2y ago

No, I’m not talking about giving LLMs chain of thought prompts or augmenting them with scratchpads - I’m literally saying that in a multilayer neural network you don’t know what concepts activations on the inner layers mean. The result of ‘where I want this conversation to be in 100 tokens time’ could absolutely be in there somewhere.

1 more reply

gyrovagueGeist2y ago

A sufficiently complicated F can include an intermediary calculation of G for future token steps.

This is not explicitly modeled or enforced for LLMs (and doing so would be interesting) but I'm not sure I could say with any sort of confidence that the network doesn't model these states at some level.

j / k navigate · click thread line to collapse

0 comments

jameshart2y ago

That isn’t incompatible with what LLMs do though.

iandanforth2y ago

It's a bit like saying your computer has everything it needs to manipulate photos but doesn't yet have Photoshop installed.

jameshart2y ago

1 more reply

gyrovagueGeist2y ago

A sufficiently complicated F can include an intermediary calculation of G for future token steps.

j / k navigate · click thread line to collapse