undefined | Better HN

0 pointsK0balt1mo ago0 comments

That’s the big bet, for sure… but if it’s reasoning that the supervising devs are injecting, and ai systems can’t reason, I guess it won’t work? Idk, I kinda think they do reason, though not in the way people might think.

It’s definitely true that they are statistical next token predictors, and that is intrinsically pattern matching, and reasonable to say not capable of reasoning.

But my intuition is that that is not really what is going on. The token prediction is the hardware layer. The software is the sum total of collective human culture they are trained on. The software is doing the reasoning, not the hardware. Like a Z80 can’t play chess, but software that runs on a Z80 certainly can.

Idk, that’s my -feeling- on the conundrum. Who knows, I guess we will find out.

0 comments

3 comments · 2 top-level

hunterpayne1mo ago· 1 in thread

"The software is the sum total of collective human culture they are trained on."

Almost, they are the median or most popular aspects of the culture upon which they are trained. So you are getting the most popular way to do something, not the best (for some definition of best). That's why the claims about LLMs being geniuses is absurd. They almost by definition are going to have the average IQ of all the people on the net weighted by how much each person posts. I'm guessing that's about 95.

K0baltOP1mo ago

Meh, while I’d agree that LLMs are idiot savants more than geniuses, I think you underestimate the general quality of training data. First, it’s all on data that was published or written. People below 80 is don’t publish or write at all, and when they do you can filter it with a regex. So already you skew the curve up 15 points or so. Then, factor in that published usually means 120+ and also includes the collective treasures of civilization. Even the average joes are going to skew towards things they are knowledgeable and passionate about, putting their best foot forward and so on. ( and the trolls get regexed to oblivion). Only the very clever trolls get through, and at least they pattern match for clever.

ACCount371mo ago

If the easiest pathway to high performance next token prediction lies through reasoning, then training for better next token prediction ends up training for reasoning implicitly.

By now, there's every reason to believe that this is what's happening in LLMs.

"Reasoning primitives" are learned in pre-training - and SFT and RL then assemble them into high performance reasoning chains, converting "reasoning as a side effect of next token prediction" to "reasoning as an explicit first class objective".

The end result is quite impressive. By now, it seems like the gap between human reasoning and LLM reasoning isn't "an entirely different thing altogether" - it's "humans still do it better at the very top end of the performance curve - when trained for the task and paying full attention".

j / k navigate · click thread line to collapse