I can’t judge if this is true, because I don’t know transformers well, but if it is, it unravels an intuitive thought I’ve never been able to articulate about not only LLMs, but possibly all pattern matching and the human analog of System 1 thinking.
Another fuzzy way of saying this is there’s something irreducible about complexity that can’t be pattern matched by any bounded heuristic – that it’s wishful thinking to assume historical data contains hidden higher-level patterns that unlock magical shortcuts to novel problems.