undefined | Better HN

0 pointssho9d ago0 comments

As I replied to a child comment - this is a nice idea that just isn't tenable in reality. AI hardware isn't just hilariously faster than consumer GPUs, it's also hilariously more power-efficient and has hilariously better connectivity. Every one of these dimensions kills the idea.

The far, FAR superior power efficiency means that even if you did harness every public GPU or GPU-like device on earth, you'd end up consuming so much excess electricity it would be cheaper on net to simply take the money that would have gone to the power bill and spend it on your own datacenter.

And even if electricity was free, having those GPUs spread over the world with internet-level latency will slow everything down by factors of thousands to millions - if it's feasible at all. Regardless, you're not getting fable-oss this decade, maybe even not this century.

It would be better for governments to buy and own their own datacenters, maybe as a coalition, and dedicate their operation to the public good. I believe that is what we actually have to do.

0 comments

48 comments · 9 top-level

ux2664789d ago· 10 in thread

AI hardware is for inference, not training. Training uses normal HPC crap. Superpods aren't really power efficient, it's kind of a meme, and it stems from limiting the power draw of other components by having less of them. It's more of a rounding error.

> you'd end up consuming so much excess electricity it would be cheaper on net to simply take the money that would have gone to the power bill and spend it on your own datacenter.

Costs spread over a large population, it really doesn't matter. You're not getting hundreds of thousands of people to pitch half their monthly electric bill to pay for someone else's datacenter. They will pay the electricity themselves quite happily though, if all they need to do is give you compute. This isn't new.

Interconnect is the bottleneck for distributed training, nothing else really.

rurban9d ago

You got it wrong. Inference can use crap GPU's. Training needs the 100x more expensive big guns. Our training machine is 100x more expensive than our inference machine.

ux2664787d ago

What I'm saying is those 100x more expensive big guns are just normal GP HPC howitzers. Systems that are exclusively designed for AI and nothing else are more or less all just edge inference TPUs.

bombcar9d ago

How is the result of training stored? How big is that? It seems reasonable to assume we’ll eventually plateau and all we’ll need is relatively infrequent training.

brandensilva9d ago

I agree, leave the training to open source federations that roll out like operating systems. Minimal training over time.

Then have inference go down to the next layer to use those models as a P2P decentralized network.

Maybe like open router could tap federation networks.

rurban9d ago

Not so often. The GPU's are running 100% for 3 weeks for a training run. We do images only, but it's the same process. And then we can use the costly GPU's for inference, local model coding agents. Training is about 4x a year. But it depends what ideas the PM or the costumers have. If they has more, more training tasks. Eg. more viruses to detect.

shoOP9d ago

> AI hardware is for inference, not training

Not sure what you are referring to, unless you don't think h100/h200/b200 are "AI hardware"

> Superpods aren't really power efficient

Maybe not compared to a specialized rig with multiple 4090s, but that is the best case for consumer hardware - the vast majority will be dramatically less efficient than that

Anyway, I agree the interconnect is by far the biggest obstacle and seems insurmountable, I should probably have led with that.

ux2664787d ago

I wouldn't consider those AI hardware, no. They're normal GPGPUs.

pksebben9d ago

Bit of a doozie though, that one.

I recall getting really excited over hinton's FF foray, right before he bailed on AI as a societal direction (which, if anyone ever had the right, I suppose he does). If one squints, one can see a backprop-free base being much easier to train on geographically distributed and heterogenous hardware.

dyauspitr9d ago

That makes no sense. It’s basically the same calculations for training as well.

Davidzheng9d ago

Are you sure most of frontier cost isn't inference in RL environments?

Cider99869d ago· 9 in thread

What makes you think Deepseek or GLM won't catch up to Fable level? Why would there be a break in the trend now?

zozbot2349d ago

DeepSeek and GLM (plus Kimi) are at or above Sonnet level wrt. favorable workloads like coding. They're not close to Opus or the latest GPT yet, and Fable is even higher than that. Other workloads relying more on real-world knowledge have them even further behind, and this can't be mitigated without making the model itself bigger and harder to host locally.

CuriouslyC9d ago

Not true. Big models buy you baked in knowledge and long context cohesion. A model can be trained to use search and knowledge base tools more efficiently to mitigate the former, and harnesses/workflows can be designed to push models into small parallel threads to mitigate the latter.

The thing that big models will always bring to the table is the ability to YOLO weak/under-specified prompts, and spend less time in the loop making sure work gets partitioned correctly. For smaller/simpler tasks the P(success) difference isn't that big.

zozbot2349d ago

Knowledge-base access is not very useful in general because a model doesn't have well-defined "known unknowns" that might trigger an agentic search of the outside knowledge base. Plus surfacing knowledge you don't know much about is itself hard.

dboreham9d ago

These things sound plausible, but have they actually been demonstrated? Wouldn't anyone who succeeded in making such a small but useful LLM be raking in the money now?

1 more reply

thepasch9d ago

> They're not close to Opus or the latest GPT yet

Disagreed. GLM-5.1 is easily as good as Opus 4.5 for all the coding purposes I could throw at it, which is the model that kicked this entire hype cycle into overdrive in the first place.

Cider99869d ago

I've found GLM to be comparable or better than Opus at writing and at a fraction of the cost.

zozbot2349d ago

Writing does not rely on real-world knowledge all that much, other than knowledge of language itself. Even tiny models can achieve that, it's even easier than coding.

1 more reply

metalspot9d ago

The key thing here is that effective intelligence = model capability / cost. If you drive down the cost of inference you can have higher effective capability even with a technically less capable model. There is nothing in Anthropic/OpenAIs general reasoning capabilities that can't be easily done much better with a purpose built harness for a domain specific task.

kuboble9d ago

I think there are at least few question marks.

One being that extrapolating from like 3 data points is hardly science. All trends break at some point.

The other is that the measures to prevent distillation of their models (if it was a secret sauce of Chinese models) could work if nobody is allowed to use them.

CuriouslyC9d ago· 7 in thread

> It would be better for governments to buy and own their own datacenters, maybe as a coalition, and dedicate their operation to the public good. I believe that is what we actually have to do.

100% agree. The US government basically has to nationalize AI and capture an outsize portion of the revenue from it in order to fix the economy, as the combination of debt burden and interest rate pressure from de-dollarization/global realignment is going to push us into a death spiral, and even if AI is a smash hit, the ~19% federal capture of corporate revenue isn't nearly enough to pull us out of it. The people owning the compute infrastructure and capturing more profit from AI at that layer is the safest, cleanest way to increase revenue capture, a sovereign wealth fund is a mediocre idea because it's possible to play shell game with stocks and redirect profit/debt (venture capital is quite good at this!).

root-parent9d ago

>> The US government basically has to nationalize AI and capture an outsize portion of the revenue from it

Currently AI has generated no profit. And as it sits, is a non viable business.

I refuse to include the sellers of shovels as AI revenue.

If the companies buying the shovels are still losing money, then the tool supplier fortunes have nothing to do with the economics of the AI application layer, who is losing money on every prompt.

aspenmartin9d ago

It's the most naive opinion that keeps getting shoveled around. You have a product that is viewed as essential by businesses, with revenue growing by 10x a year and geopolitical ramifications that have continued to rear their heads and your opinion is "this is all an unprofitable shill". It is extraordinary to me that people really believe this. Whether or not labs run at a loss today is absolutely irrelevant. There is of course steady state economics that make sense, and its currently not well known what the profitability picture is right now, so to say "Currently AI has generated no profit" is also just speculation and not a very insightful one at that.

root-parent9d ago

That businesses view it as essential...is not a profitability argument.

Businesses also bought dot com infrastructure, telecom fiber, crypto platforms, metaverse tools, and overbuilt SaaS. The question is whether the AI application layer can charge more than its full cost and the costs are inference, infrastructure, depreciation, R&D, customer acquisition, support, compliance, security, and error remediation.

The numbers so far do not inspire confidence. OpenAI reportedly did $4.3B in revenue in the first half of 2025 while burning $2.5B, and Microsoft said OpenAI related losses reduced its own quarterly net income by $3.1B. An MIT 2025 enterprise AI study found $30 to 40B spent on GenAI with 95% of organizations seeing zero return.

One of the core technical reason is that hallucination destroy enterprise economics. If SAP hallucinated 2% of invoices, or Oracle returned fake rows 2% of the time, nobody would call that early stage friction. They would call it unusable for core operations.

In legal AI, even specialized tools have been measured hallucinating 30% of the time. The problem is that as AI gets better it is confidently, plausibly wrong. That forces humans to verify it.

So the cost does not disappear. It moves from doing the work to checking the work. AI coding has the same issue. If an autopilot got you there faster but one flight in ten became unstable unless the pilot constantly supervised it, that is not productivity.

For the bull case to work, the usage must explode, the quality must improve, prices must fall, reliability must rise, legal risk must shrink, and margins must expand and all this at once. I would say that instead of a business model, this is five miracles stacked on top of each other.

1 more reply

CuriouslyC9d ago

I've heard that the API calls by themselves are ~60% profit if you ignore capital expenditures. The labs haven't generated profit because they're constantly sinking money into the next generation of larger models to stay relevant. Dario has talked about the economics of this a lot, and I do believe him there.

There's clearly also a lot of pent up demand in the corporate world for inference, the problem is that it's currently expensive enough that enterprises are balking at the cost before they've had a chance to refine processes and see projects through to fruition. That's a tractable problem to solve though.

bombcar9d ago

The number of capital-heavy businesses that are wildly profitable “if you ignore capital expenses” is too many to list.

Airlines, for example, which are so profitable they continually go bankrupt.

1 more reply

aspenmartin9d ago

> The US government basically has to nationalize AI and capture an outsize portion of the revenue from it in order to fix the economy, as the combination of debt burden and interest rate pressure from de-dollarization/global realignment is going to push us into a death spiral, and even if AI is a smash hit, the ~19% federal capture of corporate revenue isn't nearly enough to pull us out of it.

Any actual numbers to back this up? I don't see how nationalizing a very cutting edge technology outside of wartime is going to go super well. The leverage that these companies have is the same leverage that TSMC has: you can't just take over and expect things to rocket at the pace its going

AtlasBarfed9d ago

Like a system of heavily funded institutions dedicated to higher learning?

WithinReason9d ago· 6 in thread

Efficiency difference between training on GPUs and TPUs is 2x at best. You can get very efficient with tensorcores, converging to TPU efficiency. In the end math is math, you can't make a multiplication more efficient than it already is on GPU.

schobi9d ago

I guess this was more related to syncing GPUs.

If you were to take 500 computers with older 1080 GPUs, you might have enough compute/ram equivalent to an H200 GPU for training such a model. Maybe take 10000.

But if those machines are spread over 10000 homes, wired with residential internet service, training a large model will not get anywhere.

You go from "data in the same HBM memory chip" at 4.8TB/s or "data in adjacent GPU" with NVlink at 1.2 TB/s down to 25 MBit/s upload speed. Accessing the next piece of data is going to be about a Million times slower. At the same time you will heat a thousand times more, for a Million times longer.

incrudible9d ago

You need to train independently and merge rarely. The problem is the merge step. Weights are too entangled, you are not going to get an improvement commensurate to the effort. Otherwise, everyone would do it. It is an open research problem.

filup9d ago

That sounds like the way. Everyone trains their own small problems to maximally compressed weights and then merges.

zozbot2349d ago

The power-constrained part of compute is data movement, not the elementary arithmetic per se. Anyway, it's very possible to tweak the underlying design to increase throughput a lot for any given power budget at the cost of high latency. This seems especially useful for training workloads where we don't really care about latency as much.

GeoAtreides9d ago

Math is math, but sadly math isn't physics nor engineering.

pvirgiliu9d ago

math has physics.

1 more reply

iugtmkbdfil8349d ago· 5 in thread

Dunno, in a sense, torrents came among similar restrictions. Everything at consumer level was just plain awful and at dial up level, mebbe ISDN if you were very lucky, with fiber only available to ridiculously rich people and corps. But with restrictions, came approaches on how to mitigate them.

aspenmartin9d ago

Yes but not violations of the laws of physics. You need extremely fast communications, memory bandwidth, etc; you cannot get that with distributed training. You're up against the speed of light and the interconnect that powers the internet. You will always have horrifically slow latency compared to if you pack the servers together in the same place with specialized networking.

iugtmkbdfil8349d ago

<< You will always have horrifically slow latency compared to if you pack the servers together in the same place with specialized networking.

Agree about the physics; disagree about the larger point.

I am not questioning that servers packed together may achieve an optimal result in how we are currently doing things, but, and this is my point, what if we didn't.

<< you cannot get that with distributed training

This is entirely the wrong question to ask. The question to ask is: how it could be adapted to distributed training.

aspenmartin9d ago

You know what I'm surprised to find out this is far more feasible than I assumed; DiLoCo + INTELLECT models demonstrate how feasible decentralized training is already, that is very surprising to me that you can get that far with so much less communication bandwidth. Not only that, but that distributed training is _more_ feasible as you scale since compute needed scales as the square of parameter count but communication scales linearly so the overhead penalty goes down.

I think the most important problem is that you have to marshall enough compute to be meaningful, and that is going to be more and more difficult as frontier compute requirements grow.

1 more reply

boutell9d ago

If weights can't be looked at almost instantly in bulk, it just doesn't work. It's a different problem from distributing file downloads.

iugtmkbdfil8349d ago

I used it as an example. I understand the problem is hard. My larger point was that this is exactly how actual progress tends to take place. Well, that and porn.

c7b9d ago· 2 in thread

Could you put some numbers and examples behind the efficiency gap between data center and consumer-grade AI hardware? Did you include examples like the RTX Spark on the consumer side? I was always amazed at the low power consumption of unified memory style architectures. In absolute terms and even more so compared to consumer-grade GPUs. I'd be genuinely interested in a comparison with data-center-grade hardware.

zozbot2349d ago

DGX Spark is effectively prosumer hardware, better than most consumer stuff but still not comparable to actual datacenter gear. You can't just look at TDP in isolation without also comparing performance.

aspenmartin9d ago

It's more than the raw hardware, it's the interconnect and communication between the hardware at scale. These models are trained on hundreds of thousands of GPUs today. You _will_ start to see cross-datacenter training runs but this needs to efficiently decide when and how to communicate across datacenter, which bears a very high cost compared to intra-datacenter communication.

incrudible9d ago

> As I replied to a child comment - this is a nice idea that just isn't tenable in reality. AI hardware isn't just hilariously faster than consumer GPUs, it's also hilariously more power-efficient and has hilariously better connectivity. Every one of these dimensions kills the idea.

The first part is not really true though, the chips are not that much faster, the DRAM is not that much faster, and in aggregate it does not matter because there is just so much more consumer hardware out there (although perhaps that is changing as supply shifts toward datacenters).

The interconnect and data locality is the problem. If you could train it like e.g. you can render a scene with monte carlo ray tracing, any result from any node could be merged with any other and the combined result would have converged closer to the limit. I am sure research in that direction exists, it just has not proven effective within the scales it has been attempted.

herewulf9d ago

WRT government data centers, there is certainly precedent for independent researchers getting HPC time on systems owned by US national labs, research institutions, universities, and then publishing their results as part of the public good.

One would question why this hasn't already happened as the rule and as opposed to the proliferation of private data centers. However, I am sure the answers are plain and perhaps saddening to us all.

KaiserPro9d ago

> It would be better for governments to buy and own their own datacenters,

I mean thats good, but they'd have to also build thier own dataset. Which involves either paying people, or breaking the law.

Plus if they do manage to make it work, they will not get any tax revenue from it, as it'll remove the need for labour, which is where a huge amount of tax revenues come from.

its a deeply hard problem with lots of second/third order effects.

j / k navigate · click thread line to collapse

0 comments

48 comments · 9 top-level

ux2664789d ago· 10 in thread

> you'd end up consuming so much excess electricity it would be cheaper on net to simply take the money that would have gone to the power bill and spend it on your own datacenter.

Interconnect is the bottleneck for distributed training, nothing else really.

rurban9d ago

You got it wrong. Inference can use crap GPU's. Training needs the 100x more expensive big guns. Our training machine is 100x more expensive than our inference machine.

ux2664787d ago

What I'm saying is those 100x more expensive big guns are just normal GP HPC howitzers. Systems that are exclusively designed for AI and nothing else are more or less all just edge inference TPUs.

bombcar9d ago

How is the result of training stored? How big is that? It seems reasonable to assume we’ll eventually plateau and all we’ll need is relatively infrequent training.

brandensilva9d ago

I agree, leave the training to open source federations that roll out like operating systems. Minimal training over time.

Then have inference go down to the next layer to use those models as a P2P decentralized network.

Maybe like open router could tap federation networks.

rurban9d ago

shoOP9d ago

> AI hardware is for inference, not training

Not sure what you are referring to, unless you don't think h100/h200/b200 are "AI hardware"

> Superpods aren't really power efficient

Maybe not compared to a specialized rig with multiple 4090s, but that is the best case for consumer hardware - the vast majority will be dramatically less efficient than that

Anyway, I agree the interconnect is by far the biggest obstacle and seems insurmountable, I should probably have led with that.

ux2664787d ago

I wouldn't consider those AI hardware, no. They're normal GPGPUs.

pksebben9d ago

Bit of a doozie though, that one.

dyauspitr9d ago

That makes no sense. It’s basically the same calculations for training as well.

Davidzheng9d ago

Are you sure most of frontier cost isn't inference in RL environments?

Cider99869d ago· 9 in thread

What makes you think Deepseek or GLM won't catch up to Fable level? Why would there be a break in the trend now?

zozbot2349d ago

CuriouslyC9d ago

zozbot2349d ago

dboreham9d ago

These things sound plausible, but have they actually been demonstrated? Wouldn't anyone who succeeded in making such a small but useful LLM be raking in the money now?

1 more reply

thepasch9d ago

> They're not close to Opus or the latest GPT yet

Disagreed. GLM-5.1 is easily as good as Opus 4.5 for all the coding purposes I could throw at it, which is the model that kicked this entire hype cycle into overdrive in the first place.

Cider99869d ago

I've found GLM to be comparable or better than Opus at writing and at a fraction of the cost.

zozbot2349d ago

Writing does not rely on real-world knowledge all that much, other than knowledge of language itself. Even tiny models can achieve that, it's even easier than coding.

1 more reply

metalspot9d ago

kuboble9d ago

I think there are at least few question marks.

One being that extrapolating from like 3 data points is hardly science. All trends break at some point.

The other is that the measures to prevent distillation of their models (if it was a secret sauce of Chinese models) could work if nobody is allowed to use them.

CuriouslyC9d ago· 7 in thread

> It would be better for governments to buy and own their own datacenters, maybe as a coalition, and dedicate their operation to the public good. I believe that is what we actually have to do.

root-parent9d ago

>> The US government basically has to nationalize AI and capture an outsize portion of the revenue from it

Currently AI has generated no profit. And as it sits, is a non viable business.

I refuse to include the sellers of shovels as AI revenue.

If the companies buying the shovels are still losing money, then the tool supplier fortunes have nothing to do with the economics of the AI application layer, who is losing money on every prompt.

aspenmartin9d ago

root-parent9d ago

That businesses view it as essential...is not a profitability argument.

In legal AI, even specialized tools have been measured hallucinating 30% of the time. The problem is that as AI gets better it is confidently, plausibly wrong. That forces humans to verify it.

1 more reply

CuriouslyC9d ago

bombcar9d ago

The number of capital-heavy businesses that are wildly profitable “if you ignore capital expenses” is too many to list.

Airlines, for example, which are so profitable they continually go bankrupt.

1 more reply

aspenmartin9d ago

AtlasBarfed9d ago

Like a system of heavily funded institutions dedicated to higher learning?

WithinReason9d ago· 6 in thread

schobi9d ago

I guess this was more related to syncing GPUs.

If you were to take 500 computers with older 1080 GPUs, you might have enough compute/ram equivalent to an H200 GPU for training such a model. Maybe take 10000.

But if those machines are spread over 10000 homes, wired with residential internet service, training a large model will not get anywhere.

incrudible9d ago

filup9d ago

That sounds like the way. Everyone trains their own small problems to maximally compressed weights and then merges.

zozbot2349d ago

GeoAtreides9d ago

Math is math, but sadly math isn't physics nor engineering.

pvirgiliu9d ago

math has physics.

1 more reply

iugtmkbdfil8349d ago· 5 in thread

aspenmartin9d ago

iugtmkbdfil8349d ago

<< You will always have horrifically slow latency compared to if you pack the servers together in the same place with specialized networking.

Agree about the physics; disagree about the larger point.

I am not questioning that servers packed together may achieve an optimal result in how we are currently doing things, but, and this is my point, what if we didn't.

<< you cannot get that with distributed training

This is entirely the wrong question to ask. The question to ask is: how it could be adapted to distributed training.

aspenmartin9d ago

I think the most important problem is that you have to marshall enough compute to be meaningful, and that is going to be more and more difficult as frontier compute requirements grow.

1 more reply

boutell9d ago

If weights can't be looked at almost instantly in bulk, it just doesn't work. It's a different problem from distributing file downloads.

iugtmkbdfil8349d ago

I used it as an example. I understand the problem is hard. My larger point was that this is exactly how actual progress tends to take place. Well, that and porn.

c7b9d ago· 2 in thread

zozbot2349d ago

aspenmartin9d ago

incrudible9d ago

herewulf9d ago

One would question why this hasn't already happened as the rule and as opposed to the proliferation of private data centers. However, I am sure the answers are plain and perhaps saddening to us all.

KaiserPro9d ago

> It would be better for governments to buy and own their own datacenters,

I mean thats good, but they'd have to also build thier own dataset. Which involves either paying people, or breaking the law.

Plus if they do manage to make it work, they will not get any tax revenue from it, as it'll remove the need for labour, which is where a huge amount of tax revenues come from.

its a deeply hard problem with lots of second/third order effects.

j / k navigate · click thread line to collapse