Besides your answer on tokenization being right on the mark (gwern even has articles discussing this exact issue -
https://gwern.net/gpt-3#bpes), that linked paper is extremely powerful for theoretical thinking about this problem. I literally just left a comment claiming that almost all of human thought is encoded in LLMs, leading almost all tasks to be considered interpolation.
If this is true, I suppose based on what we know of the curse of dimensionality it makes sense. Very thought-provoking work.