undefined | Better HN

0 pointszamadatix2y ago0 comments

This oversimplifies too much. GPT4 is able to add random large numbers too large to have been in the data set. When it's wrong it tends to be closely wrong e.g. maybe the first 4 digits of division on a large number by a decimal number with many fractional digits are correct. Neither of these examples can be explained by memorization alone, other operational mechanics are being applied to get these outcomes.

0 comments

2 comments · 1 top-level

jjtheblunt2y ago· 1 in thread

To your point, it would be really neat if we could trace what it's doing to get answers, see why it's sometimes right, sometimes close but wrong!

zamadatixOP2y ago

You can trace it pretty easily in a local neural net which takes a couple of minutes to train. The only reason we can't trace it in GPT is it's a closed model so we don't have access to do so.

Largely two things come into play: 1) Some part of the neural net is emulating more traditional logic, but it may not always be the most activated part or tuned to be perfect in the answers 2) There isn't really a "jmp" equivalent in a single iteration, so the neural net has to learn to not only do decimal division but do it based on iterating output tokens, continuing perfectly each token output, and choosing to put the right stuff into context and keep that context activated at the right time.

"Activated" in this case means, more or less, the group of neurons specializing on this task are being both fed and listened to.

You can even train a neural net to emulate a traditional addition circuit directly, it's just less efficient than one would think if you're trying to build a general purpose model instead of a specialized one.

j / k navigate · click thread line to collapse