You can trace it pretty easily in a local neural net which takes a couple of minutes to train. The only reason we can't trace it in GPT is it's a closed model so we don't have access to do so.
Largely two things come into play: 1) Some part of the neural net is emulating more traditional logic, but it may not always be the most activated part or tuned to be perfect in the answers 2) There isn't really a "jmp" equivalent in a single iteration, so the neural net has to learn to not only do decimal division but do it based on iterating output tokens, continuing perfectly each token output, and choosing to put the right stuff into context and keep that context activated at the right time.
"Activated" in this case means, more or less, the group of neurons specializing on this task are being both fed and listened to.
You can even train a neural net to emulate a traditional addition circuit directly, it's just less efficient than one would think if you're trying to build a general purpose model instead of a specialized one.