undefined | Better HN

0 pointsTrasmatta1y ago0 comments

I think we need to start moving away from this explanation, because the truth is more complex. Anthropic's own research showed that Claude does actually "plan ahead", beyond the next token.

https://www.anthropic.com/research/tracing-thoughts-language...

> Instead, we found that Claude plans ahead. Before starting the second line, it began "thinking" of potential on-topic words that would rhyme with "grab it". Then, with these plans in mind, it writes a line to end with the planned word.

0 comments

19 comments · 3 top-level

ceh1231y ago· 13 in thread

I'm not sure if this really says the truth is more complex? It is still doing next-token prediction, but it's prediction method is sufficiently complicated in terms of conditional probabilities that it recognizes that if you need to rhyme, you need to get to some future state, which then impacts the probabilities of the intermediate states.

At least in my view it's still inherently a next-token predictor, just with really good conditional probability understandings.

dymk1y ago

Like the old saying goes, a sufficiently complex next token predictor is indistinguishable from your average software engineer

johnthewise1y ago

A perfect next token predictor is equivalent to god

1 more reply

jermaustin11y ago

But then so are we? We are just predicting the next word we are saying, are we not? Even when you add thoughts behind it (sure some people think differently - be it without an inner monologue, or be it just in colors and sounds and shapes, etc), but that "reasoning" is still going into the act of coming up with the next word we are speaking/writing.

spookie1y ago

This type of response always irks me.

It shows that we, computer scientists, think of ourselves as experts on anything. Even though biological machines are well outside our expertise.

We should stop repeating things we don't understand.

BobaFloutist1y ago

We're not predicting the next word we're most likely to say, we're actively choosing the word that we believe most successfully conveys what we want to communicate. This relies on a theory of mind of those around us and an intentionality of speech that aren't even remotely the same as "guessing what we would say if only we said it"

2 more replies

thomastjeffery1y ago

We are really only what we understand ourselves to be? We must have a pretty great understanding of that thing we can't explain then.

mensetmanusman1y ago

I wouldn’t trust a next word guesser to make any claim like you attempt, ergo we aren’t, and the moment we think we are, we aren’t.

hadlock1y ago

Humans and LLMs are built differently, it seems disingenuous to think we both use the same methods to arrive at the same general conclusion. I can inherently understand some proofs of pythagorean's theorem but an LLM might apply different ones for various reasons. But the output/result is still the same. If a next token generator run in parallel can generate a performant relational database that doesn't directly imply I am also a next token generator.

1 more reply

skywhopper1y ago

Humans do far more than generate tokens.

Mahn1y ago

At this point you have to start entertaining the question of what is the difference between general intelligence and a "sufficiently complicated" next token prediction algorithm.

dontlikeyoueith1y ago

A sufficiently large lookup table in DB is mathematically indistinguishable from a sufficiently complicated next token prediction algorithm is mathematically indistinguishable from general intelligence.

All that means is that treating something as a black box doesn't tell you anything about what's inside the box.

1 more reply

Tadpole91811y ago

But then this classifier is entirely useless because that's all humans are too? I have no reason to believe you are anything but a stochastic parrot.

Are we just now rediscovering hundred year-old philosophy in CS?

BalinKing1y ago

There's a massive difference between "I have no reason to believe you are anything but a stochastic parrot" and "you are a stochastic parrot".

2 more replies

cmiles741y ago· 3 in thread

It reads to me like they compare the output of different prompts and somehow reach the conclusion that Claude is generating more than one token and "planning" ahead. They leave out how this works.

My guess is that they have Claude generate a set of candidate outputs and the Claude chooses the "best" candidate and returns that. I agree this improves the usefulness of the output but I don't think this is a fundamentally different thing from "guessing the next token".

UPDATE: I read the paper and I was being overly generous. It's still just guessing the next token as it always has. This "multi-hop reasoning" is really just another way of talking about the relationships between tokens.

TrasmattaOP1y ago

That's not the methodology they used. They're actually inspecting Claude's internal state and suppression certain concepts, or replacing them with others. The paper goes into more detail. The "planning" happens further in advance than "the next token".

cmiles741y ago

Okay, I read the paper. I see what they are saying but I strongly disagree that the model is "thinking". They have highlighted that relationships between words is complicated, which we already knew. They also point out that some words are related to other words which are related to other words which, again, we already knew. Lastly they used their model (not Claude) to change the weights associated with some words, thus changing the output to meet their predictions, which I agree is very interesting.

Interpreting the relationship between words as "multi-hop reasoning" is more about changing the words we use to talk about things and less about fundamental changes in the way LLMs work. It's still doing the same thing it did two years ago (although much faster and better). It's guessing the next token.

1 more reply

therealpygon1y ago

They have written multiple papers on the subject, so there isn’t much need for you to guess incorrectly what they did.

dontlikeyoueith1y ago

> Anthropic's own research showed that Claude does actually "plan ahead", beyond the next token.

For a very vacuous sense of "plan ahead", sure.

By that logic, a basic Markov-chain with beam search plans ahead too.

j / k navigate · click thread line to collapse

0 comments

19 comments · 3 top-level

ceh1231y ago· 13 in thread

At least in my view it's still inherently a next-token predictor, just with really good conditional probability understandings.

dymk1y ago

Like the old saying goes, a sufficiently complex next token predictor is indistinguishable from your average software engineer

johnthewise1y ago

A perfect next token predictor is equivalent to god

1 more reply

jermaustin11y ago

spookie1y ago

This type of response always irks me.

It shows that we, computer scientists, think of ourselves as experts on anything. Even though biological machines are well outside our expertise.

We should stop repeating things we don't understand.

BobaFloutist1y ago

2 more replies

thomastjeffery1y ago

We are really only what we understand ourselves to be? We must have a pretty great understanding of that thing we can't explain then.

mensetmanusman1y ago

I wouldn’t trust a next word guesser to make any claim like you attempt, ergo we aren’t, and the moment we think we are, we aren’t.

hadlock1y ago

1 more reply

skywhopper1y ago

Humans do far more than generate tokens.

Mahn1y ago

At this point you have to start entertaining the question of what is the difference between general intelligence and a "sufficiently complicated" next token prediction algorithm.

dontlikeyoueith1y ago

All that means is that treating something as a black box doesn't tell you anything about what's inside the box.

1 more reply

Tadpole91811y ago

But then this classifier is entirely useless because that's all humans are too? I have no reason to believe you are anything but a stochastic parrot.

Are we just now rediscovering hundred year-old philosophy in CS?

BalinKing1y ago

There's a massive difference between "I have no reason to believe you are anything but a stochastic parrot" and "you are a stochastic parrot".

2 more replies

cmiles741y ago· 3 in thread

It reads to me like they compare the output of different prompts and somehow reach the conclusion that Claude is generating more than one token and "planning" ahead. They leave out how this works.

TrasmattaOP1y ago

cmiles741y ago

1 more reply

therealpygon1y ago

They have written multiple papers on the subject, so there isn’t much need for you to guess incorrectly what they did.

dontlikeyoueith1y ago

> Anthropic's own research showed that Claude does actually "plan ahead", beyond the next token.

For a very vacuous sense of "plan ahead", sure.

By that logic, a basic Markov-chain with beam search plans ahead too.

j / k navigate · click thread line to collapse