> How does chat-gpt actually get this right?
Your answer: > its output is purely probabilistic, based on existing corpus of text
Because GPT was trained on existing text, some of which included numbers and counting, it's learnt the natural ordering of most common/everyday numbers.For larger or more complex numbers, it's learnt the patterns behind how they're constructed linguistically, which allows it to output a written count in sequence.
This same pattern recognition doesn't work anywhere near as well for actual numerals (e.g. "47600", instead of "forty seven thousand six hundred"), as the tokenizer tends to break long numerals apart (e.g. into ["476", "00"]).
Unless training included line-level skips, rather than just next-word skips (like word2vec) or concept-level associations? At the line level, or paragraph level, ordered numerical sequences are obviously very common in formal texts or in code.
I've seen sentence based training, I suppose for code (which it seems GPT4 excells at) line-level training would be essential.
Anyone recommend a mid-level read on this covering different modes of training and such; I'm happy with a bit of code and undergrad level maths. Thanks.