$100m is manageable, if you've got 100m paying subscribers or companies using your API for a year you can recoup the costs, but there aren't many companies with 100m users to monetise for it. $1B feels like it's pushing it, only a few companies in the world can monetise, and realistically it's about lasting through the next round to be able to continue competing, not about making the money back.
$100B though, that's a whole different game again. That's like asking for the biggest private investment ever made, for capex that depreciates at $50B a year. You'd have to be stupid to do it. The public markets wouldn't take it.
Investing that much in hardware that depreciates over 5+ years and is theoretically still usable at the end, maybe, but even then the biggest companies in the world are still spending an order of magnitude less per year, so the numbers end up working out very differently. Plus that's companies with 1B users ready to monetise.
If a 100 Billion dollar training run produces the highest quality model in the land across all metrics and capabilities. That will be the model thats used, at most there would be 1-2 other firsm willing to spend 100 Billion to chase the market.
Isn’t that exactly what’s happening?
A $300k 8x H100 pod with 5kW power supply burns at most $6k per year in power at $0.15/kWh. The majority of the money is going to capital equipment for the first time in the software industry in decades.
These top of the line chips last for much longer in the depreciation game. The A100 was released in 2020 but cloud providers still have trouble meeting demand and charge a premium for them.
NVIDIA claim 10.2kW for a DGX H100 pod. https://docs.nvidia.com/dgx/dgxh100-user-guide/introduction-....
Your point still stands though where power is a fraction of the cost.
The bigger issue is power + cooling and how many units are needed to train the better models.
For AGI, the bet is that someone will build an AI capable enough to automate AI development. Once we get there it will pay for itself. The question is what the cost-speed tradeoff to get there looks like.
This seems less like doing business as usual and more like betting big to be part of something really transformative.
it may never happen, especially with this current approach
at which point you've burnt hundreds of billions of dollars, emitted millions of tonnes of CO2 and all you've got out of it is a marginally better array of doubles
Better in this case means some combination of "less errors for the same size" and/or "bigger and smarter". Fundamentally, they're still the same thing, just more and better.
Unfortunately, the scaling is (roughly) logarithmic. So for every 10x increase in scale you get a +1 better model. Scaling up 1,000x gets you just a +3 improvement, and so on.
This is the Anthropic CEO talking up his company's capital needs to the Norwegian Sovereign Wealth Fund ( Norges Bank Investment Management ) and trying to justify some absurd 100bn valuation.
While I don't disagree 100%, my question to you is:
who/what says this is the case/why? GPT-3.5 was released/made popular "to the masses" not too long ago. Where do you feel the pressure for a quantum leap "quickly" is coming from?
they would be better off not bullshitting their investors.
investors with huge piles of cash should buy themselves a brain and stop funding bullshitters
I wonder which timelines had this scenario…
Or is it the total overall cost of buying TPUs / GPUs, developing infrastructure, constructing data centers, putting together quality data sets, doing R&D, paying salaries, etc. as well as training the model itself? I could see that overall investment into AI scaling into the tens of billions over the next few years.