I think the first two paragraphs of the post are exactly saying that the bottleneck is memory... Long contexts, bigger but less flop-intensive models (moe's).
The funny thing about scaling laws is that as soon as they were known, the whole objective became learning how to break them - bending the curve, at least. They provided an incredibly useful target, but 'law' was a bit too strong a word.