AI models that cost $1B to train are underway, $100B models coming (opens in new tab)

(tomshardware.com)

34 pointspulse71y ago58 comments

58 comments

38 comments · 11 top-level

danpalmer1y ago· 13 in thread

No company can afford to spend $100B on something that will be obsolete a year later, you just can't recover the investment from sales that quickly.

$100m is manageable, if you've got 100m paying subscribers or companies using your API for a year you can recoup the costs, but there aren't many companies with 100m users to monetise for it. $1B feels like it's pushing it, only a few companies in the world can monetise, and realistically it's about lasting through the next round to be able to continue competing, not about making the money back.

$100B though, that's a whole different game again. That's like asking for the biggest private investment ever made, for capex that depreciates at $50B a year. You'd have to be stupid to do it. The public markets wouldn't take it.

Investing that much in hardware that depreciates over 5+ years and is theoretically still usable at the end, maybe, but even then the biggest companies in the world are still spending an order of magnitude less per year, so the numbers end up working out very differently. Plus that's companies with 1B users ready to monetise.

lumost1y ago

The business model here is the same as semi-conductor fabrication/design. 2022 kick started the foundation model race, teams were readily able to raise 5-25 MM to chase foundation models. In early 2024, several of those teams began to run out of money due to the realization that competitive modeling efforts were in the 1-10 Billion dollar range.

If a 100 Billion dollar training run produces the highest quality model in the land across all metrics and capabilities. That will be the model thats used, at most there would be 1-2 other firsm willing to spend 100 Billion to chase the market.

sigmoid101y ago

This. The seemingly neverending run for foundation models only works as long as companies can afford it. If one of them spends 100+B, it will be a long time before compute catches up to the point that a competitor could reproduce it at reasonable budgets. This is essentially the race of who's going to own AGI and it shouldn't be surprising that people are willing to spend these amounts.

1 more reply

throwup2381y ago

> Investing that much in hardware that depreciates over 5+ years and is theoretically still usable at the end, maybe

Isn’t that exactly what’s happening?

A $300k 8x H100 pod with 5kW power supply burns at most $6k per year in power at $0.15/kWh. The majority of the money is going to capital equipment for the first time in the software industry in decades.

These top of the line chips last for much longer in the depreciation game. The A100 was released in 2020 but cloud providers still have trouble meeting demand and charge a premium for them.

dagmx1y ago

How are you getting 5kW?

NVIDIA claim 10.2kW for a DGX H100 pod. https://docs.nvidia.com/dgx/dgxh100-user-guide/introduction-....

Your point still stands though where power is a fraction of the cost.

The bigger issue is power + cooling and how many units are needed to train the better models.

1 more reply

bcherny1y ago

That’s true for AI, but it is not the right way to think about AGI.

For AGI, the bet is that someone will build an AI capable enough to automate AI development. Once we get there it will pay for itself. The question is what the cost-speed tradeoff to get there looks like.

demondemidi1y ago

Pay for itself? Who will pay for this? I don’t think you realize how much $100B is. To put it in perspective, a cutting edge fab costs almost $10B (TSMC) and only three companies can barely afford that.

2 more replies

mark_l_watson1y ago

I agree with you. Large tech corporations are making big bets to reach AGI first. As an example, if you are CEO of Google, do you want Microsoft or Meta to achieve AGI first.

This seems less like doing business as usual and more like betting big to be part of something really transformative.

awakeasleep1y ago

for 100B they would probably want a realistic description of how they get to AGI. Thats a bit too much money for the handwavy answers we have right now for the path between LLMs and AGI (which doesn't even have a great definition)

blibble1y ago

it's not a "once", it's an "if"

it may never happen, especially with this current approach

at which point you've burnt hundreds of billions of dollars, emitted millions of tonnes of CO2 and all you've got out of it is a marginally better array of doubles

1 more reply

mrtesthah1y ago

automating “development” does not necessarily lead to AGI. An LLM could make minor efficiency improvements all day long and still not change the fundamental approach.

ca_tech1y ago

I agree and do not think any company would make that investment directly. Nvidia selling to Microsoft renting to OpenAI, I'm sure you could make that add up to $100B on paper. In the long run the economics are likely much more complicated and consist of "agreements worth $x".

Flozzin1y ago

Even if they did, they would be the largest target for hackers or corporate espionage. I would find it hard to believe, that they would get any sort of good return on this before it was all over the internet, or at least in the hands of several competitors.

deepnotderp1y ago

TSMC spends ~$30B every 2 years

blowski1y ago· 4 in thread

What will the benefit be of more expensive models? More facts, because it's consumed more information? More ability to, say, adjust writing style? Or is this all necessary just to filter out the garbage recycled AI content it's now consuming?

jiggawatts1y ago

Right around the time GPT-4 was first announced, OpenAI published a paper that basically said that training can "just keep going" with no obvious end in sight. Recently, Meta tried to train a model 75x as long as is naively optimal, and it just kept getting better.

Better in this case means some combination of "less errors for the same size" and/or "bigger and smarter". Fundamentally, they're still the same thing, just more and better.

Unfortunately, the scaling is (roughly) logarithmic. So for every 10x increase in scale you get a +1 better model. Scaling up 1,000x gets you just a +3 improvement, and so on.

iamleppert1y ago

And what, exactly is the ROI on "better"? Who cares if the model is better, is it 100B better? Who are going to buy these services, what consumer will pay for it?

1 more reply

lm284691y ago

> What will the benefit be of more expensive models?

Bleed investors dry before the next fad pops up

blitzar1y ago

> What will the benefit be of more expensive models?

A G650 to fly to your 85m yacht in the med doesnt come cheap.

DrMiaow1y ago· 3 in thread

elmo-arms-up-world-burning.gif

dialup_sounds1y ago

For the meme-impaired: https://knowyourmeme.com/memes/elmo-rise

bitwize1y ago

Somewhere, a Tamarian is posting shaka-when-the-walls-fell.jpg

hi_dang_1y ago

It took an entire thread of nostradamus-tier bullshit before we finally got the first serious response. Bravo.

htrp1y ago· 2 in thread

X to Doubt.

This is the Anthropic CEO talking up his company's capital needs to the Norwegian Sovereign Wealth Fund ( Norges Bank Investment Management ) and trying to justify some absurd 100bn valuation.

belter1y ago

Yes. The release of GPT-5 will make or break the AI movement. If the capabilities are not another quantum leap, it will become clear the scaling laws are not all. These investments will be unsustainable on the basis of any economic metrics you use.

MuffinFlavored1y ago

> If the capabilities are not another quantum leap

While I don't disagree 100%, my question to you is:

who/what says this is the case/why? GPT-3.5 was released/made popular "to the masses" not too long ago. Where do you feel the pressure for a quantum leap "quickly" is coming from?

1 more reply

fuzzfactor1y ago· 2 in thread

If you had an extra $100 Billion, some people could think of something better to spend it on, some not.

user901313131y ago

Metaverse! oh wait that is too old and forgotten already.

xinayder1y ago

web3 metaverse powered by genAI and NFTs?

1 more reply

demondemidi1y ago· 2 in thread

Wow, this CEO entitlement and wealth pissing contests are laughable.

ai4ever1y ago

altman, and amodei are speaking their book, but in doing so seem like shady snake-oil salesmen.

they would be better off not bullshitting their investors.

blitzar1y ago

the people not bullshitting their investors have no investors

investors with huge piles of cash should buy themselves a brain and stop funding bullshitters

mensetmanusman1y ago· 1 in thread

If only this had came before crypto. We could have had a system that underwrites international finance and pays for training on the cheap.

I wonder which timelines had this scenario…

seydor1y ago

That sounds like a great idea for our next bubble

hurrdurr571y ago

Well, I guess the question I have is, what exactly does he mean by the "cost to train"? As in, just the cost of the electricity used to train that one model? That seems really excessive.

Or is it the total overall cost of buying TPUs / GPUs, developing infrastructure, constructing data centers, putting together quality data sets, doing R&D, paying salaries, etc. as well as training the model itself? I could see that overall investment into AI scaling into the tens of billions over the next few years.

speedylight1y ago

I could see the US subsidizing most of that $100B, just because they can, and more importantly, it would be the kind of tactical advantage that’s needed to make sure US tech companies stay relevant in a world where there’s a growing desire to breakaway from them in-favor of homegrown solutions.

seydor1y ago

BigTech wants all your sovereign money

Temporary_313371y ago

All this burn and recruiters and bots still match on keywords in CV.

j / k navigate · click thread line to collapse

58 comments

38 comments · 11 top-level

danpalmer1y ago· 13 in thread

No company can afford to spend $100B on something that will be obsolete a year later, you just can't recover the investment from sales that quickly.

lumost1y ago

sigmoid101y ago

1 more reply

throwup2381y ago

> Investing that much in hardware that depreciates over 5+ years and is theoretically still usable at the end, maybe

Isn’t that exactly what’s happening?

These top of the line chips last for much longer in the depreciation game. The A100 was released in 2020 but cloud providers still have trouble meeting demand and charge a premium for them.

dagmx1y ago

How are you getting 5kW?

NVIDIA claim 10.2kW for a DGX H100 pod. https://docs.nvidia.com/dgx/dgxh100-user-guide/introduction-....

Your point still stands though where power is a fraction of the cost.

The bigger issue is power + cooling and how many units are needed to train the better models.

1 more reply

bcherny1y ago

That’s true for AI, but it is not the right way to think about AGI.

demondemidi1y ago

2 more replies

mark_l_watson1y ago

I agree with you. Large tech corporations are making big bets to reach AGI first. As an example, if you are CEO of Google, do you want Microsoft or Meta to achieve AGI first.

This seems less like doing business as usual and more like betting big to be part of something really transformative.

awakeasleep1y ago

blibble1y ago

it's not a "once", it's an "if"

it may never happen, especially with this current approach

at which point you've burnt hundreds of billions of dollars, emitted millions of tonnes of CO2 and all you've got out of it is a marginally better array of doubles

1 more reply

mrtesthah1y ago

automating “development” does not necessarily lead to AGI. An LLM could make minor efficiency improvements all day long and still not change the fundamental approach.

ca_tech1y ago

Flozzin1y ago

deepnotderp1y ago

TSMC spends ~$30B every 2 years

blowski1y ago· 4 in thread

jiggawatts1y ago

Better in this case means some combination of "less errors for the same size" and/or "bigger and smarter". Fundamentally, they're still the same thing, just more and better.

Unfortunately, the scaling is (roughly) logarithmic. So for every 10x increase in scale you get a +1 better model. Scaling up 1,000x gets you just a +3 improvement, and so on.

iamleppert1y ago

And what, exactly is the ROI on "better"? Who cares if the model is better, is it 100B better? Who are going to buy these services, what consumer will pay for it?

1 more reply

lm284691y ago

> What will the benefit be of more expensive models?

Bleed investors dry before the next fad pops up

blitzar1y ago

> What will the benefit be of more expensive models?

A G650 to fly to your 85m yacht in the med doesnt come cheap.

DrMiaow1y ago· 3 in thread

elmo-arms-up-world-burning.gif

dialup_sounds1y ago

For the meme-impaired: https://knowyourmeme.com/memes/elmo-rise

bitwize1y ago

Somewhere, a Tamarian is posting shaka-when-the-walls-fell.jpg

hi_dang_1y ago

It took an entire thread of nostradamus-tier bullshit before we finally got the first serious response. Bravo.

htrp1y ago· 2 in thread

X to Doubt.

This is the Anthropic CEO talking up his company's capital needs to the Norwegian Sovereign Wealth Fund ( Norges Bank Investment Management ) and trying to justify some absurd 100bn valuation.

belter1y ago

MuffinFlavored1y ago

> If the capabilities are not another quantum leap

While I don't disagree 100%, my question to you is:

who/what says this is the case/why? GPT-3.5 was released/made popular "to the masses" not too long ago. Where do you feel the pressure for a quantum leap "quickly" is coming from?

1 more reply

fuzzfactor1y ago· 2 in thread

If you had an extra $100 Billion, some people could think of something better to spend it on, some not.

user901313131y ago

Metaverse! oh wait that is too old and forgotten already.

xinayder1y ago

web3 metaverse powered by genAI and NFTs?

1 more reply

demondemidi1y ago· 2 in thread

Wow, this CEO entitlement and wealth pissing contests are laughable.

ai4ever1y ago

altman, and amodei are speaking their book, but in doing so seem like shady snake-oil salesmen.

they would be better off not bullshitting their investors.

blitzar1y ago

the people not bullshitting their investors have no investors

investors with huge piles of cash should buy themselves a brain and stop funding bullshitters

mensetmanusman1y ago· 1 in thread

If only this had came before crypto. We could have had a system that underwrites international finance and pays for training on the cheap.

I wonder which timelines had this scenario…

seydor1y ago

That sounds like a great idea for our next bubble

hurrdurr571y ago

Well, I guess the question I have is, what exactly does he mean by the "cost to train"? As in, just the cost of the electricity used to train that one model? That seems really excessive.

speedylight1y ago

seydor1y ago

BigTech wants all your sovereign money

Temporary_313371y ago

All this burn and recruiters and bots still match on keywords in CV.

j / k navigate · click thread line to collapse