I'm not saying that if your goal is to come up with a usable general learning algorithm that it is just "as simple as neural network and done." What I'm saying is the converse: that the general learning capabilities of LLMs are most likely explained by the fact that, well, they are general learners, via the universal approximation theorem.
Your other comment, I think, suggests why we're just now starting to see more general learning capabilities out of neural networks, when the theory says that a single hidden layer is enough: with a single hidden layer, you really need to get all the weights pretty close to "right" to see general learning/universal approximator behavior. When you have more than one hidden layer, then some of your weights can be wrong, as long as the errors are corrected in later layers.
Now, I'm not an AI researcher or even anyone who works anywhere near this area, but I did take a course or two in grad school, and this seems at least intuitively plausible to me. If there are researchers in the field reading this, I'd definitely like to hear their takes, because I'm totally open to being completely wrong here. I'd rather be one of the lucky 10,000 than just have this half-baked idea that seems right. :-)