The claim made was that improving SotA models has historically taken exponentially more compute.
The claim implies that improving SotA models takes more compute even while integrating technological advancements to make models more efficient.
Unless you think that such advancements have been historically ignored by the curators of SotA models?