This is the biggest threat to the GPU economy – software breakthroughs that enable inference on commodity CPU hardware or specialized ASIC boards that hyperscalers can fabricate themselves. Google has a stockpile of TPUs that seem fairly effective, although it’s hard to tell for certain because they don’t make it easy to rent them.
I don't think we will need to wait for anything as unpredictable as a breakthrough. Optimizing inference for the most clearly defined tasks, which are also the tasks where value is most readily quantified, like coding, is underway now.