Reductio ad absurdum. A 300K-param model was small enough to be trained offline, on curated datasets, with CPUs and RAM capacities that absolutely existed at the time, especially in research centers.
Backprop was known. Data was available. Narrow tasks (completion, summarization, categorization) were relevant. The model that runs on a Pentium II could have been trained on a Cray, or across time on any reasonably powerful 90s workstation. That’s not fantasy, LeNet 5 with its 65K weight was trained on a mere Sun station in the early 90s.
The limiting factor wasn’t compute, it was the conceptual framing as well as the datasets. No one seriously tried, because the field was dominated by symbolic logic and rule-based AI. That’s the core of the argument.