undefined | Better HN

0 pointsmjburgess2y ago0 comments

Well ChatGPT is a mixture of LLMs, and no doubt a great deal more to make this work out (i'd suppose, eg., that they augment their datasets to have conversational framing, with actions/verbs etc. augumented -- or they acheive similar with models-on-top).

Nevertheless, roughly consider a dataset D for which we have an approximate stochastic model of its conditional frequency associations: P(next|previous..., D) etc.

Then if your prompt really got that reply, from this model, it would do so like this:

"Construct" is first projected to an encoding which replaces it, effectively, with a set of related words (Construct, Make, Create, Write...) all weighted by how they co-occur with construct.

Then we sample from D based on this word set, obtaining roughly, all conversations where these related words were used, call this Dc.

Next take "a sentence" and replace it with its word-set, say, (Sentence, Phrase, Words, ...) and sample conversations from Dc in which these occur, Dcs..

And so on. Since each token in your prompt actually corresponds to basically all possible words but weighted by association, each "filtering operation" actually selects vast amounts of the training data (space).

Finally, consider the reverse problem: what words could this system possibly produce from this process that weren't relevant to your prompt? Given enough data (PBs of text from all possible digitized conversations, books, etc.) then a sensible-seeming answer becomes the only plausible one to generate.

Now, I do think here PBs wouldnt be enough to generate a single statistical model that behaved this way -- so you need a mixture of them (ie., ChatGPT) and I suspect you also need a system for regulating discrete constraints such as quantities. I suspect many deployed LLMs have improved in this area due to models trained to be specifically sensitive to quantities.

0 comments

2 comments · 1 top-level

danielbln2y ago· 1 in thread

There are plenty of LLMs that aren't MoE/ensemble, and there are also plenty of LLMs that are pure completion models, that haven't been fine-tuned/RLHF'd to be conversational. I would recommend you read up a bit more on how modern LLMs work, I get the feeling your intuition on that could improve.

edit: I can't reply to the child comment as we've reached the thread limit, but I can say that LLMs are not trained on a tiny subset of data, they are trained on as much data as possible. A LLM becomes converational/instruct due to fine tuning it with reinforcement learning data. GPT-3.5 is by all accounts not an ensemble model, Llama2/3 is NOT an ensemble model/MoE, yet will allow you to do in-context learning/few shot prompting effortlessly. As said, I think your intuition on how these LLMs work and (as far as we know) how they work, needs readjustment.

mjburgessOP2y ago

I dont see what I'm missing. I'm addressing why ChatGPT generated a response given a prompt. If another LLM had been used, something far simpler, the explanation would be different.

If a highly simplified LLM will generate text against discrete quantitative constraints, under a variety of scenarios, then I've underestimated how highly structured the relevant training data must be.

An LLM trained on a physics textbook isnt going to be conversational; one trained on shakespear will generate text from elizabethan english..

ie., in every case, the explanation of why any given response was generated is given by explaining the distribution of its dataset. So if a shakespear LLM generates, "to be or otherwise to be not is alike everything ere annon" we will be mostly explaining how/why those words were used by shakespear.

and if an LLM is small, and is actually discretely sensitive to quantities across a large vareity of domains.. my guess is that its training data has been specially prepared. This is jsut a guess about hte nature of human commnuication though, it has nothing to do with LLMs. I just guess that we don't distribute "quantity tokens" in such a highly patterned way that a simple LLM model would work to find it

j / k navigate · click thread line to collapse