undefined | Better HN

0 pointsIanCal2y ago0 comments

This is wrong, or at least a simplification to the point of removing any value.

> NNs are a stat fitting alg learning a conditional probability distribution, P(next_word|previous_words).

They are trained to maximise this, yes.

> Their weights are a model of this distribution.

That doesn't really follow, but let's leave that.

> Why does, 'mat' follow from 'the cat sat on the ...' because 'mat' is the most frequent word in the dataset; and the NN is a model of those frequencies.

Here's the rub. If how you describe them is all they're doing then a sequence of never-before-seen words would have no valid response. All words would be equally likely. It would mean that a single brand new word would result in absolute gibberish following it as there's nothing to go on.

Let's try:

Input: I have one kjsdhlisrnj and I add another kjsdhlisrnj, tell me how many kjsdhlisrnj I now have.

Result: You now have two kjsdhlisrnj.

I would wager a solid amount that kjsdhlisrnj never appears in the input data. If it does pick another one, it doesn't matter.

So we are learning something more general than the frequencies of sequences of tokens.

I always end up pointing to this but OthelloGPT is very interesting https://thegradient.pub/othello/

While it's trained on sequences of moves, what it does is more than just "sequence a,b,c is followed by d most often"

0 comments

5 comments · 2 top-level

mjburgess2y ago· 2 in thread

Any NN "trained on" data sampled from an abstract complete outcome space (eg., a game with formal rules; mathematical sequences, etc) can often represent that space completely. It comes down to whether you can form conditional probability models of the rules, and that's usually possible because that's what abstract rules are.

> I have one kjsdhlisrnj and I add another kjsdhlisrnj, tell me how many kjsdhlisrnj I now have.

1. P(number-word|tell me how many...) > P(other-kinds-of-words|tell me how many...)

2. P(two|I have one ... I add another ...) > P(one|...) > P(three|...) > others

This is trivial.

IanCalOP2y ago

Right, learning more abstract rules about how things work is the goal and where the value comes in. Not all algorithms are able to do this, even if they can do what you describe in your first comment.

That's why they're interesting, othellogpt is interesting because it builds a world model.

mjburgess2y ago

It builds a model of a "world" whose structure is conditional probabilities, this is circular. It's like saying you can use a lego model to build a model of another lego model. All the papers which "show" NNs building "world" models arent using any world. It's lego modelling lego.

The lack of a world model only matters when the data NNs are trained on aren't valid measures of the world that data is taken to model. All the moves of a chess game are a complete model of chess. All the books ever written aren't a model of, well, anything -- the structure of the universe isnt the structure of text tokens.

The only reason all statistical algorithms, including NNs, appear to model the actual world is because patterns in data give this appearance: P(The Sun is Hot) > P(The Sun is Cold) -- there is no model of the sun here.

The reason P("The Sun is Hot") seems to model the sun, is because we can read the english words "sun" and "hot" -- it is we who think the machine which generates this text does so semantically.. but the people who wrote that phrase in the dataset did so; the machine is just generating "hot" because of that dataset.

1 more reply

pas2y ago· 1 in thread

how does it work underneath?

"kjsdhlisrnj" is in the context, it gets tokenized, and now when the LLM is asked to predict/generate next-token sequences somehow "kjsdhlisrnj" is there too. it learns patterns. okay sure, they ger encoded somehow, but during infernce how does this lead to application of a recalled pattern on the right token(s)?

also, can it invent new words?

empath-nirvana2y ago

Yes. This is one of the first things people did with chatgpt:

can you help me invent a language for a fictional setting with talking frogs. Give me a sample sentence that frogs might say in their language, along with a translation. ChatGPT Creating a language for a fictional setting can be a fun and imaginative task! For your setting with talking frogs, we'll consider the environment and characteristics that might influence their language. Here’s a brief outline of the linguistic features and a sample sentence:

Linguistic Features of the Frog Language Phonetics: Since frogs are known for their croaks and ribbits, their language might include a lot of guttural and clicking sounds, which can be represented in writing with characters like "k", "g", "q", and clicks like "!" or "ǂ".

Syntax: We might imagine that frogs are direct and to the point, so their sentence structure could be subject-verb-object (SVO), similar to English, for straightforward communication.

Vocabulary: Words would likely relate to their daily life—water, flies, lily pads, weather changes, predators, etc.

Grammar: Simplified grammar with minimal use of tenses and conjugations to reflect an emphasis on the present moment, which is typical for many animalistic languages in fiction.

Sample Language: Froglan Froglii (hello) Lok (water) Krik (eat) Zraak (fly - the insect) !quu (danger) ǂogg (jump) Sample Sentence in Froglan "Froglii! Lok krik zraak ǂogg."

Translation "Hello! Jump eat fly water."

This sentence, structured in a somewhat English syntax for ease of understanding, would literally mean that a frog is greeting another and then mentioning that it will jump towards water to eat flies. The sentence structure and vocabulary can be further refined and expanded based on how deeply you want to dive into the language creation!

j / k navigate · click thread line to collapse