https://www.lesswrong.com/posts/cgqh99SHsCv3jJYDS/we-found-a...
On the flip side is Chris Olah's research (e.g. https://distill.pub/2017/feature-visualization/) which maps a single neuron's activation into the output space (by optimizing for outputs which would have arising from high activations of given neurons). If we take this seriously and call these outputs the "favored" output of an individual neuron, we might believe we've characterized that neuron, but it's really still the entire downstream computation from that neuron and there are still substantial overlaps and redundancies.
These are exactly where I'm getting the sense that the standard rule should be that neural nets have very distributed representations.
Even more to the point, these analyses tend to focus on "binary" questions, either a feature is present or not. Or, said another way, they implicitly treat activation as a "vote", where greater activation implies a greater component weighting of that which that neuron represents.
This, to my mind, flies in the face of GPT-n doing arithmetic, though. I'd lean much closer to the argument that each individual quantity is represented as a neuron as opposed to each quantity is represented continuously by the actual activation _value_.
This "voting" representation is also reinforced by the common use of tanh-like activation functions. There's a long history of having neuron activations be probabilistic/logit-like quantities.
Of course, I cannot speak for certain about any neural net and especially not for GPT-3/4. I just have a hard time taking as the null hypothesis that it is storing magnitudes of numbers in a way that is directly affected by the float precision in use.