The far, FAR superior power efficiency means that even if you did harness every public GPU or GPU-like device on earth, you'd end up consuming so much excess electricity it would be cheaper on net to simply take the money that would have gone to the power bill and spend it on your own datacenter.
And even if electricity was free, having those GPUs spread over the world with internet-level latency will slow everything down by factors of thousands to millions - if it's feasible at all. Regardless, you're not getting fable-oss this decade, maybe even not this century.
It would be better for governments to buy and own their own datacenters, maybe as a coalition, and dedicate their operation to the public good. I believe that is what we actually have to do.
> you'd end up consuming so much excess electricity it would be cheaper on net to simply take the money that would have gone to the power bill and spend it on your own datacenter.
Costs spread over a large population, it really doesn't matter. You're not getting hundreds of thousands of people to pitch half their monthly electric bill to pay for someone else's datacenter. They will pay the electricity themselves quite happily though, if all they need to do is give you compute. This isn't new.
Interconnect is the bottleneck for distributed training, nothing else really.
Then have inference go down to the next layer to use those models as a P2P decentralized network.
Maybe like open router could tap federation networks.
Not sure what you are referring to, unless you don't think h100/h200/b200 are "AI hardware"
> Superpods aren't really power efficient
Maybe not compared to a specialized rig with multiple 4090s, but that is the best case for consumer hardware - the vast majority will be dramatically less efficient than that
Anyway, I agree the interconnect is by far the biggest obstacle and seems insurmountable, I should probably have led with that.
I recall getting really excited over hinton's FF foray, right before he bailed on AI as a societal direction (which, if anyone ever had the right, I suppose he does). If one squints, one can see a backprop-free base being much easier to train on geographically distributed and heterogenous hardware.
The thing that big models will always bring to the table is the ability to YOLO weak/under-specified prompts, and spend less time in the loop making sure work gets partitioned correctly. For smaller/simpler tasks the P(success) difference isn't that big.
Disagreed. GLM-5.1 is easily as good as Opus 4.5 for all the coding purposes I could throw at it, which is the model that kicked this entire hype cycle into overdrive in the first place.
One being that extrapolating from like 3 data points is hardly science. All trends break at some point.
The other is that the measures to prevent distillation of their models (if it was a secret sauce of Chinese models) could work if nobody is allowed to use them.
100% agree. The US government basically has to nationalize AI and capture an outsize portion of the revenue from it in order to fix the economy, as the combination of debt burden and interest rate pressure from de-dollarization/global realignment is going to push us into a death spiral, and even if AI is a smash hit, the ~19% federal capture of corporate revenue isn't nearly enough to pull us out of it. The people owning the compute infrastructure and capturing more profit from AI at that layer is the safest, cleanest way to increase revenue capture, a sovereign wealth fund is a mediocre idea because it's possible to play shell game with stocks and redirect profit/debt (venture capital is quite good at this!).
Currently AI has generated no profit. And as it sits, is a non viable business.
I refuse to include the sellers of shovels as AI revenue.
If the companies buying the shovels are still losing money, then the tool supplier fortunes have nothing to do with the economics of the AI application layer, who is losing money on every prompt.
Businesses also bought dot com infrastructure, telecom fiber, crypto platforms, metaverse tools, and overbuilt SaaS. The question is whether the AI application layer can charge more than its full cost and the costs are inference, infrastructure, depreciation, R&D, customer acquisition, support, compliance, security, and error remediation.
The numbers so far do not inspire confidence. OpenAI reportedly did $4.3B in revenue in the first half of 2025 while burning $2.5B, and Microsoft said OpenAI related losses reduced its own quarterly net income by $3.1B. An MIT 2025 enterprise AI study found $30 to 40B spent on GenAI with 95% of organizations seeing zero return.
One of the core technical reason is that hallucination destroy enterprise economics. If SAP hallucinated 2% of invoices, or Oracle returned fake rows 2% of the time, nobody would call that early stage friction. They would call it unusable for core operations.
In legal AI, even specialized tools have been measured hallucinating 30% of the time. The problem is that as AI gets better it is confidently, plausibly wrong. That forces humans to verify it.
So the cost does not disappear. It moves from doing the work to checking the work. AI coding has the same issue. If an autopilot got you there faster but one flight in ten became unstable unless the pilot constantly supervised it, that is not productivity.
For the bull case to work, the usage must explode, the quality must improve, prices must fall, reliability must rise, legal risk must shrink, and margins must expand and all this at once. I would say that instead of a business model, this is five miracles stacked on top of each other.
There's clearly also a lot of pent up demand in the corporate world for inference, the problem is that it's currently expensive enough that enterprises are balking at the cost before they've had a chance to refine processes and see projects through to fruition. That's a tractable problem to solve though.
Airlines, for example, which are so profitable they continually go bankrupt.
Any actual numbers to back this up? I don't see how nationalizing a very cutting edge technology outside of wartime is going to go super well. The leverage that these companies have is the same leverage that TSMC has: you can't just take over and expect things to rocket at the pace its going
If you were to take 500 computers with older 1080 GPUs, you might have enough compute/ram equivalent to an H200 GPU for training such a model. Maybe take 10000.
But if those machines are spread over 10000 homes, wired with residential internet service, training a large model will not get anywhere.
You go from "data in the same HBM memory chip" at 4.8TB/s or "data in adjacent GPU" with NVlink at 1.2 TB/s down to 25 MBit/s upload speed. Accessing the next piece of data is going to be about a Million times slower. At the same time you will heat a thousand times more, for a Million times longer.
Agree about the physics; disagree about the larger point.
I am not questioning that servers packed together may achieve an optimal result in how we are currently doing things, but, and this is my point, what if we didn't.
<< you cannot get that with distributed training
This is entirely the wrong question to ask. The question to ask is: how it could be adapted to distributed training.
I think the most important problem is that you have to marshall enough compute to be meaningful, and that is going to be more and more difficult as frontier compute requirements grow.
The first part is not really true though, the chips are not that much faster, the DRAM is not that much faster, and in aggregate it does not matter because there is just so much more consumer hardware out there (although perhaps that is changing as supply shifts toward datacenters).
The interconnect and data locality is the problem. If you could train it like e.g. you can render a scene with monte carlo ray tracing, any result from any node could be merged with any other and the combined result would have converged closer to the limit. I am sure research in that direction exists, it just has not proven effective within the scales it has been attempted.
One would question why this hasn't already happened as the rule and as opposed to the proliferation of private data centers. However, I am sure the answers are plain and perhaps saddening to us all.
I mean thats good, but they'd have to also build thier own dataset. Which involves either paying people, or breaking the law.
Plus if they do manage to make it work, they will not get any tax revenue from it, as it'll remove the need for labour, which is where a huge amount of tax revenues come from.
its a deeply hard problem with lots of second/third order effects.