The following blogpost by an OpenAI employee can lead us to compare patterns and transistors.
https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dat...
The ultimate model, in his (author's) sense, would suss out all patterns and then patterns among those patterns and so on, so that it
delivers on compute and compression efficiency.
To achieve compute and compression efficiency, it means LLM models have to cluster all similar patterns together and deduplicate them. This also means successively levels of pattern recognition to be done i.e. patterns among patterns among patterns and so on , so as to do the deduplication across all hierarchy it is constructed. Full trees or hierarchies won't get deduplicated but relevant regions / portions of those trees will, which implies fusing together in ideas space. This means root levels will be the most abstract patterns. This representation also means appropriate cross-pollination among different fields of studies further increasing effectiveness.
This reminds me of a point which my electronics professor made on why making transistors smaller has all the benefits and only few disadvantages. Think of these patterns as transistors. The more deduplicated and closely packed they are, the more beneficial they will be. Of course, this "packing together" is happening in mathematical space.
Another thing which patterns among patterns among patterns reminds me of homotopies.
This brilliant video by PBS Infinite Series is amazing. As I can see, compressing homotopies is what LLMs do, replace homotopies with patterns.
https://www.youtube.com/watch?v=N7wNWQ4aTLQ