undefined | Better HN

0 pointsmjburgess3y ago0 comments

> Can you provide more details about what you mean by this distributional structure

The distribution of sorted digits is:

(0 1 2 3 4 5 6 7 8 9) before

(1 before 0 1 2 3 4 5 6 7 8 9) before

(2 before 0 1 2 3 4 5 6 7 8 9) before

(3 before 0 1 2 3 4 5 6 7 8 9) ...

...

When you compute the search space you're treating each number as a unique token (ie., that all ordinals are unique) -- but its not sorting unique ordinals, it's sorting digits in a sequential model ie., it learns P(Next|Prev)

The (sequential) distribution of digits amongst sorted numbers is tiny

0 comments

3 comments · 1 top-level

jbay8083y ago· 2 in thread

> The (sequential) distribution of digits amongst sorted numbers is tiny

This is why 10^80 random lists gets reduced to only 10^36 sorted lists. However, 10^36 is still very large with respect to the size of the model.

mjburgessOP3y ago

You're treating each list as unique, all the lists have a distribution of digits in common... I'm at a loss to even understand what you're saying here really -- this is why you need to actually state, formally, what you think the "LLMs are just stats" hypothesis amounts to.

It seems you think it amounts to saying LLMs sample from a combinatorial space, naively construed -- but that isnt the claim?

The claim is rather, they sample from a statistical distribution of tokens.

Take each position in the input vector, 1...127. It needs to "learn":

P(x0 position | y, x1...x127 positions), P(1|y, 2...127), P(2|y, 3...127), etc.

Which is a family of 127 conditional distributions that seem trivial to learn.

I really don't know why you think the size of a combinatorial space is relevant here?

All the sorted lists share basically the same tiny family of conditional distributions { P(x_i | x_(i-1)...x_127) }

jbay8083y ago

I agree a neural network can certainly learn the conditional distributions that let it make that choice correctly every time. Once it has done so, then do you not have a sorting algorithm?

1 more reply

j / k navigate · click thread line to collapse