I disagree with Sutton that a main issue is using human generated data. We humans are trained on that and we don't run into such issues.
I expect the problem is more structural to how the LLMs, and other ML approaches, actually work. Being disembodied algorithms trying to break all knowledge down to a complex web of probabilities, and assuming that anything predicting based only on those quantified data, seems hugely limiting and at odds with how human intelligence seems to work.