If power costs are significantly lower, they can pay for themselves by the time they are outdated. It also means you can run more instances of a model in one datacenter, and that seems to be a big challenge these days: simply building an enough data centres and getting power to them. (See the ridiculous plans for building data centres in space)
A huge part of the cost with making chips is the masks. The transistor masks are expensive. Metal masks less so.
I figure they will eventually freeze the transistor layer and use metal masks to reconfigure the chips when the new models come out. That should further lower costs.
I don’t really know if this makes sanse. Depends on whether we get new breakthroughs in LLM architecture or not. It’s a gamble essentially. But honestly, so is buying nvidia blackwell chips for inference. I could see them getting uneconomical very quickly if any of the alternative inference optimised hardware pans out