undefined | Better HN

0 pointslostmsu2y ago0 comments

> just have to be chosen such that the distribution Xc,Xp is sufficiently small in the training data -- but not that the tokens of Xp are themselves rare

Great idea. Now prove you can actually choose such a distribution, lol.

0 comments

mjburgess2y ago

I think this is easy, just make Xp sentences of the kind = "I define `randomchars()` to be this `term-in-Xc()`" and swamp the dataset with Xc.

Everything here actually just follows formally from what NNs are: they're just empirical function approximations.

It will always be the case that they just model the probabilistic structure of the dataset and not the data generating process.

Since, in language, there are discrete constraints which make P(...) = 1 or P(...) = 0 --- you can trivially produce datasets showing that it learns P(...) = mistake-you-created-deliberately and not either 0,1.

As above, the LLM switches from 95% confidence "chocolate" to 95% confidence "popcorn" with a trivial non-semantic permutation of the prompt.

The obscene issue in all this is that we know this already -- empirical function approximation of historical datasets just produces associative probabilistic models of those datasets.

lostmsuOP2y ago

> I think this is easy, just make Xp sentences of the kind = "I define `randomchars()` to be this `term-in-Xc()`" and

`randomchars()` does not match your own requirement `but not that the tokens of Xp are themselves rare` and therefore is unsuitable.

mjburgess2y ago

good point --- so replace it with a `sample()` fn that selects from an appropriate distribution over the data

1 more reply

j / k navigate · click thread line to collapse

0 comments

mjburgess2y ago

I think this is easy, just make Xp sentences of the kind = "I define `randomchars()` to be this `term-in-Xc()`" and swamp the dataset with Xc.

Everything here actually just follows formally from what NNs are: they're just empirical function approximations.

It will always be the case that they just model the probabilistic structure of the dataset and not the data generating process.

As above, the LLM switches from 95% confidence "chocolate" to 95% confidence "popcorn" with a trivial non-semantic permutation of the prompt.

The obscene issue in all this is that we know this already -- empirical function approximation of historical datasets just produces associative probabilistic models of those datasets.

lostmsuOP2y ago

> I think this is easy, just make Xp sentences of the kind = "I define `randomchars()` to be this `term-in-Xc()`" and

`randomchars()` does not match your own requirement `but not that the tokens of Xp are themselves rare` and therefore is unsuitable.

mjburgess2y ago

good point --- so replace it with a `sample()` fn that selects from an appropriate distribution over the data

1 more reply

j / k navigate · click thread line to collapse