Calculating the cost of a Google DeepMind paper (opens in new tab)

(152334h.github.io)

303 points152334H1y ago150 comments

150 comments

81 comments · 13 top-level

BartjeD1y ago· 19 in thread

If this ran on google's own cloud it amounts to internal bookkeeping. The only cost is then the electricity and used capacity. Not consumer pricing. So negligible.

It is rather unfortunate that this sort of paper is hard to reproduce.

That is a BIG downside, because it makes the result unreliable. They invested effort and money in getting an unreliable result. But perhaps other research will corroborate. Or it may give them an edge in their business, for a while.

They chose to publish. So they are interested in seeing it reproduced or improved upon.

rrr_oh_man1y ago

> They chose to publish. So they are interested in seeing it reproduced or improved upon.

Call me cynical, but this is not what I experienced to be the #1 reason of publishing AI papers.

ash-ali1y ago

I hope someone could share their insight on this comment. I think the other comments are fragile and don't hold too strongly.

2 more replies

echoangle1y ago

As someone not in the AI space, what do you think is the reason for publishing? Marketing and hype for your products?

1 more reply

stairlane1y ago

> The only cost is then the electricity and used capacity. Not consumer pricing. So negligible.

I don’t think this is valid, as this point seems to ignore the fact that the data center that this compute took place in required a massive investment.

A paper like this is more akin to HEPP research. Nobody has the capability to reproduce the higgs results outside of at the facility the research was conducted within (CERN).

I don’t think reproduction was a concern of the researchers.

morbia1y ago

The Higgs results were reproduced because there are two independent detectors at CERN (Atlas and CMS). Both collaborations are run almost entirely independently, and the press are only called in to announce a scientific discovery if both find the same result.

Obviously the 'best' result would be to have a separate collider as well, but no one is going to fund a new collider just to reaffirm the result for a third time.

1 more reply

Rastonbury1y ago

Kinda but Google sells compute so it makes money off the data centre investment, assuming they had spare capacity for this it's negligible at Google scale

rty321y ago

Opportunity cost is cost. What you could have earned by selling the resources to customers instead of using them yourself is what the resources are worth.

g15jv2dp1y ago

This assumes that you can sell 100% of the resources' availability 100% of the time. Whenever you have more capacity that you can sell, there's no opportunity cost in using it yourself.

2 more replies

nkrisc1y ago

Not if you’re only using the resources when they’re available because no customer has paid to use them.

K0balt1y ago

I think Google produces their own power, so they don’t pay distribution cost which is at least one third of the price of power, even higher for large customers.

Cthulhu_1y ago

I'd argue it's not hard to reproduce per se, just expensive; thankfully there are at least half a dozen (cloud) computing providers that have the necessary resources to do so. Google Cloud, AWS and Azure are the big competitors in the west (it seems / from my perspective), but don't underestimate the likes of Alibaba, IBM, DigitalOcean, Rackspace, Salesforce, Tencent, Oracle, Huawei, Dell and Cisco.

pintxo1y ago

> They chose to publish. So they are interested in seeing it reproduced or improved upon.

Not necessarily, publishing also ensure that the stuff is no longer patentable.

slashdave1y ago

Forgive me if I am wrong, but all of the techniques explored are already well known. So, what is going to be patented?

2 more replies

jfengel1y ago

Is the electricity cost negligible? It's a pretty compute intensive application.

Of course it would be a tiny fraction of the $10m figure here, but even 1% would be $100,000. Negligible to Google, but for Google even $10 million is couch cushion money.

dekhn1y ago

The electricity cost is not neglible- I ran a service that had multiples of $10M in marginal electricity spend (IE, servers running at 100% utilization, consuming a significantly higher fraction than when idle, or partly idle). Ultimately, the scientific discoveries weren't worth the cost, so we shut the service down.

$10M is about what Google would spend to get a publication in a top-tier journal. But google's internal pricing and costs don't look anything like what people cite for external costs; it's more like a state-supported economy with some extremely rich oligarch-run profit centers that feed all the various cottage industries.

stavros1y ago

I feel like your comment answers itself: If you have the money to be running a datacenter of thousands of A100 GPUs (or equivalent), the cost of the electricity is negligible to you, and definitely worth training a SOTA model with your spare compute.

1 more reply

K0balt1y ago

I’d imagine publishing is more oriented toward attracting and retaining talent. You need to scratch that itch or the academics will jump ship.

ape41y ago

Its like them running SETI@home ;)

dekhn1y ago

We ran Folding@Home at google. we were effectively the largest single contributor of cycles for at least a year. It wasn't scientifically worthwhile, so we shut it down after a couple years.

That was using idle cycles on Intel CPUs, not GPUs or TPUs though.

arcade791y ago· 15 in thread

A lot of misunderstandings among the commenters here.

From the link: "the total compute cost it would take to replicate the paper"

It's not Google's cost. Google's cost is of course entirely different. It's the cost for the author if he were to rent the resources to replicate the paper.

For Google, all of it is running at a "best effort" resource tier, grabbing available resources when not requested by higher priority jobs. It's effectively free resources (except electricity consumption). If any "more important" jobs with a higher priority comes in and asks for the resources, the paper-writers jobs will just be preempted.

bombcar1y ago

This is the side effect of underutilized capital and it’s present in many cases.

For example, if YOU want to rent a backhoe to do some yard rearrangement it’s going to cost you.

But Bob who owns BackHoesInc has them sitting around all the time when they’re not being rented or used; he can rearrange his yard wholesale or almost free.

thaumasiotes1y ago

> This is the side effect of underutilized capital and it’s present in many cases.

"Underutilized" isn't the right word here. There's some value in putting your capital to productive use. But, once immediate needs are satisfied, there's more value in having the capital available to address future needs quickly than there would be in making sure that everything necessary to address those future needs is tied up in low-value work. Option value is real value; being prepared for unforeseen but urgent circumstances is a real use.

5 more replies

mikepurvis1y ago

Car lots with attached garages are like this too. That brake and suspension work they were going to charge you several thousand dollars for? Once you trade in ol' Bessie they'll do that for pennies on the dollar during slack time; it doesn't hurt them if the car sits around for a few weeks or months before being ready for sale.

1 more reply

dweekly1y ago

Possible corollary: it may be difficult to regularly turn out highly compute-dependent research if you're paying full retail rack rates for your hardware (i.e. using someone else's cloud).

punnerud1y ago

Can others also buy the “best effort” tier?

If the job could easily run for weeks, even when you could buy your way for doing it in a day.

Then have a bidding on this “best effort” resource, where they factor in electricity at any given time

curt151y ago

Is the "best effort" tier similar to AWS spot instances?

2 more replies

v3ss0n1y ago

Sure,.land a job there, work the way all up against the cooperate bs and toxicity and you can get best effort tier.

Those effort needs to be added in the cost calculation too.

v3ss0n1y ago

Sure,.land a job there, work the way all up against the cooperate bs and toxicity and you can get best effort tier.

Those effort needs to be added in the cost calculation too

imtringued1y ago

According to neoclassical economists this is impossible since you can easily and instantaneously scale infrastructure up and down continuously at no cost and the future is known so demand can be predicted reliably.

The problem with neoclassical economics is that it doesn't concern itself with the physical counterpart of liquidity. It is assumed that the physical world is just as liquid as the monetary world.

The "liquidity mismatch" between money and physical capital must be bridged through overprovisioning on the physical side. If you want the option to choose among n different products, but only choose m products, then the n - m unsold products must be priced into the m bought products. If you can repurpose the unsold products, then you make a profit or you can lower costs for the buyer of the m products.

I would even go as far as to say that the production of liquidity is probably the driving force of the economy, because it means we don't have to do complicated central planning and instead use simple regression models.

jopsen1y ago

> I would even go as far as to say that the production of liquidity is probably the driving force of the economy.

Isn't that all what high frequency traders would say? :)

Perhaps there is some limit at which additional liquidity doesn't offer much value?

1 more reply

152334HOP1y ago

Is it free-priority based?

I was told by an employee that GDM internally has a credits system for TPU allocation, with which researchers have to budget out their compute usage. I may have completely misunderstood what they were describing, though.

rldjbpin1y ago

if this is the way they pull it off consistently, it might be a good business models for those working on research like stability to also moonlight a gpu cloud service.

it is a hustle only for the near future while this bubble lasts, but can help reduce costs.

huijzer1y ago

Still, don’t get high on your own supply.

mrazomor1y ago

This assumes the common resources (CPU, RAM, etc.), not the ones required for the LLM training (GPU, TPU, etc.). It's different economy.

TL; DR: It's not ~free.

akutlay1y ago

Why does GPU matter? Do you think GCP keeps GPU utilization at 100% at all times?

2 more replies

rgmerk1y ago· 7 in thread

Worth pointing out here that in other scientific domains, papers routinely require hundreds of thousands of dollars, sometimes millions of dollars, of resources to produce.

My wife works on high-throughout drug screens. They routinely use over $100,000 of consumables in a single screen, not counting the cost of the screening “libraries”, the cost of using some of the -$10mil of equipment in the lab for several weeks, the cost of the staff in the lab itself, and the cost of the time of the scientists who request the screens and then take the results and turn them into papers.

ramraj071y ago

I estimated that any paper that has mouse work and produced in a first world country (I.e. they have to do good by the animals), the minimum cost of that paper in expenses and salary would be $200,000. Average likely higher. Tens of thousands of papers a year published like this!

esperent1y ago

To be fair, supposing the Google paper took six months to a year to produce, it also must have cost several hundred thousand dollars in salaries and other non-compute costs.

paxys1y ago

These are mostly fixed costs. If you produce a hundred papers from the same team and same research, the costs aren't 100x.

3 more replies

dumb12241y ago

Well not everyone starts experiment anew. Many also reuse accumulated datasets. For human data even more so.

slashdave1y ago

I assure you that the companies performing these screens expect a return on this investment. It is not for a journal paper.

godelski1y ago

I used to believe this line. But then I worked for a big tech company where my manager constantly made those remarks ("the difference in industry and academia is that in industry it has to actually work"). I then improved the generalization performance (i.e. "actually work") by over 100% and they decided not to update the model they were selling. Then again, I had a small fast model and it was 90% as accurate as the new large transformer model. Though they also didn't take the lessons learned and apply them to the big model, which had similar issues but were just masked by the size.

Plus, I mean, there are a lot of products that don't work. We all buy garbage and often can't buy not garbage. Though I guess you're technically correct that in either of these situations there can still be a return on investment, but maybe that shouldn't be good enough...

1 more reply

Metacelsus1y ago

Yeah, I'm a wet-lab biologist and my most recent paper (which is still not past peer review) has already cost about $200,000. And I just spent another $2000 today...

sigmoid101y ago· 7 in thread

This is calculation is pretty pointless and the title is flat out wrong. It also gets lost in finer details while totally missing the bigger picture. After all, the original paper written by people either working for Google or at Google. So you can safely assume they used Google resources. That means they wouldn't have used H100s, but Google TPUs. Since they design and own these TPUs, you can also safely assume that they don't pay whatever they charge end users for them. At the scale of Google, this basically amounts to the cost of houseing/electricity, and even that could be a tax write-off. You also can't directly assume that the on paper performance of something like an H100 will be the actual utilization you can achieve, so basing any estimate in terms of $/GPU-hour will be off by default.

That means Google payed way less than this amount and if you wanted to reproduce the paper yourself, you would potentially pay a lot more, depending on how many engineers you have in your team to squeeze every bit of performance per hour out of your cluster.

c-linkage1y ago

Reproducibility is a key element of the scientific process

How is anyone else going to reproduce the experiment if it's going to cost them $10 million because they don't work at Google and would have to rent the infrastructure?

Sebb7671y ago

But what's the solution here? Not doing the (possibly) interesting research because it's hard to reproduce? That doesn't sound like a better situation.

That being said, yes, this is hard to reproduce for your average Joe, but there are also a lot of companies (like OpenAI, Facebook, ...) that are able to throw this amount of hardware at the problem. And in a few years you'll probably be able to do it on commodity hardware.

tokai1y ago

Cheap compared to some high energy physic experiments.

1 more reply

rvnx1y ago

This specific paper looks plausible, but a lot of published AI papers are simply fake because it is one of the sectors where it is possible to make non-reproducible claims. "We don't give source-code or dataset", but actually they didn't find or do anything of interest.

It works and helps to get a salary raise or a better job, so they continue.

A bit like when someone goes to a job interview, didn't do anything, and claims "My work is under NDA".

injuly1y ago

> This is calculation is pretty pointless and the title is flat out wrong.

No, it's not. The author clearly states in the very first paragraph that this is the price it would take them to reproduce the results.

Nowhere in the article (or the title) have they implied that this is how much Google spent.

sigmoid101y ago

They have changed both the title and the article since it was posted... almost certainly due to comments like these which used to be at the top. Though editing titles should be impossible imo. Editing comments is fine, but if you screw up titles you should be forced to resubmit and not be able to rug-pull an entire discussion.

michaelmior1y ago

Even if they did use H100s and paid the current premium on them, you could probably buy 100 H100s and the boxes to put them in for less than $10M.

pama1y ago· 6 in thread

3USD/hour on the H100 is much more expensive than a reasonable amortized full ownership cost, unless one assumes the GPU is useless within 18 months, which I find a bit dramatic. The MFU can be above 40% and certainly well above the 35% in the estimate, also for small models with plain pytorch and trivial tuning [1] I didnt read the linked paper carefully but I seriously doubt the google team used vocab embedding layers with 2 D V parameters stated in the link, because this would be suboptimal by not tying the weights of the token embedding layer in the decoder architecture (even if they did double the params in these layers, it would not lead to 6 D V compute because the embedding input is indexed). To me these assumptions suggested a somewhat careless attitude towards the cost estimation and so I stopped reading the rest of this analysis carefully. My best guess is that the author is off by a large factor in the upward direction, and a true replication with H100/200 could be about 3x less expensive.

[1] if the total cost estimate was relatively low, say less than 10k, then of course the lowest rental price and a random training codebase might make some sense in order to reduce administrative costs; once the cost is in the ballpark of millions of USD, it feels careless to avoid optimizing it further. There exist H100s in firesales or Ebay occasionally, which could reduce the cost even more, but the author already mentions 2USD/gpu/hour for bulk rental compute, which is better than the 3USD/gpu/hour estimate they used in the writeup.

152334HOP1y ago

You are correct on true H100 ownership costs being far lower. As I mention in the H100 blurb, the H100 numbers are fungible and I don't mind if you halve them.

MFU can certainly be improved beyond 40%, as I mention. But on the point of small models specifically: the paper uses FSDP for all models, and I believe a rigorous experiment should not vary sharding strategy due to numerical differences. FSDP2 on small models will be slow even with compilation.

The paper does not tie embeddings, as stated. The readout layer does lead to 6DV because it is a linear layer of D*V, which takes 2x for a forward and 4x for a backward. I would appreciate it if you could limit your comments to factual errors in the post.

pama1y ago

My bad on the 6 D V estimate; you are correct that if they do a dense decoding (rather than a hierarchical one as google used to do in the old days) the cost is exactly 6 D V. I cannot edit the GP comment and I will absorb the shame of my careless words there. I was put off by the subtitle and initial title of this HN post, though the current title is more appropriate and correct.

Even if it's a small model, one could use ddp or FSDP/2 without slowdowns on fast interconnect, which certainly adds to the cost. But if you want to reproduce all the work at the cheapest price point you only need to parallelize to the minimal level for fitting in memory (or rather, the one that maxes the MFU), so everything below 2B parameters runs on a single H100 or single node.

lonk111y ago

I think the commenter was thinking about the input embedding layer, where to get an input token embedding the model does a lookup of the embedding by index, which is constant time.

And the blog post author is talking about the output layer where the model has to produce an output prediction for every possible token in the vocabulary. Each output token prediction is a dot-product between the transformer hidden state (D) and the token embedding (D) (whether shared with input or not) for all tokens in the vocabulary (V). That's where the VD comes from.

It would be great to clarify this in the blog post to make it more accessible but I understand that there is a tradeoff.

tedivm1y ago

When I was at Rad AI we did out the math on rent versus buy, and it was just so absolutely ridiculously obvious that buy was the way to go. Cloud does not make sense for AI training right now, as the overhead costs are considerably higher than simply purchasing a cluster, colocating it at a place like Colovore, and paying for "on hands" support. It's not even close.

spi1y ago

Do you have sources for "The MFU can be above 40% and certainly well above the 35 % in the estimate"?

Looking at [1], the authors there claim that their improvements were needed to push BERT training beyond 30% MFU, and that the "default" training only reaches 10%. Certainly numbers don't translate exactly, it might well be that with a different stack, model, etc., it is easier to surpass, but 35% doesn't seem like a terribly off estimate to me. Especially so if you are training a whole suite of different models (with different parameters, sizes, etc.) so you can't realistically optimize all of them.

It might be that the real estimate is around 40% instead of the 35% used here (frankly it might be that it is 30% or less, for that matter), but I would doubt it's so high as to make the estimates in this blog post terribly off, and I would doubt even more that you can get that "also for small models with plain pytorch and trivial tuning".

[1] https://www.databricks.com/blog/mosaicbert

pama1y ago

Please look at any of the plain pytorch codes by Karpathy that complement llm.c. If you want scalable codes, please look at Megatron-LM.

hnthr_w_y1y ago· 5 in thread

that's not very much in the business range, it's a lot when it comes to paying us salaries.

willis9361y ago

Any company of any size that doesn't learn the right lessons from a $10M mistake will be out of business before long.

brainwad1y ago

That's like staffing a single-manager team on a bad project for a year. Which I assure you happens all the time in big companies, and yet they survive.

1 more reply

vishnugupta1y ago

https://killedbygoogle.com/

I’m confident each one of them were multiple of $10M investments.

And this is just what we know because they were launched publicly.

1 more reply

willis9361y ago

To be clear: what I mean by "not learning the right lessons" is a company deciding that the issue with wasting $10M in six months is that they didn't do it 100x in parallel in three months. Then when that goes wrong they must need to do it 100x wider in parallel again in three weeks.

OtherShrezzing1y ago

I'm not really certain that's true at Google's size. Their annual revenue is something like a quarter trillion dollars. 25,000x larger than a $10m mistake.

The equivalent wastage for a self-employed person would be allowing a few cups of Starbucks coffee per year to go cold.

1 more reply

jeffbee1y ago· 4 in thread

I think if you wanted to think about a big expense you'd look at AlphaStar.

5kg1y ago

I am wondering if AlphaStar is the most expensive paper ever.

jeffbee1y ago

I think it could be. I also think it is likely that HN frequenter `dekhn` has personally spent more money on compute resources than any other living human, so maybe they will chime in on how the cost gets allocated to the research.

1 more reply

lern_too_spel1y ago

"Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC"

ipsum21y ago

It's disappointing that they never developed AlphaStar enough to become super-human (unlike AlphaGo), even lower level players were able to adapt to its playstyle.

The cost was probably the limiting factor.

faitswulff1y ago· 2 in thread

I wonder how many tons of CO2 that amounts to. Google Gemini estimated 125,000 tons of carbon emissions, but I don’t have the know-how to double check it.

chazeon1y ago

If you use solar energy, then there is no CO2 emission. Right?

ipsum21y ago

Google buys carbon credits to make up for CO2 emissions, they've never relied strictly on solar.

godelski1y ago· 1 in thread

Worth mentioning that "GPU Poor" isn't created because those without much GPU compute can't contribute, but rather because those with massive amounts of GPU are able to perform many more experiments and set a standard, or shift the Overton window. The big danger here is just that you'll start expecting a higher "thoroughness" from everyone else. You may not expect this level, but seeing this level often makes you think what was sufficient before is far from sufficient now, and what's the cost of that lower bound?

I mention this because a lot of universities and small labs are being edged out of the research space but we still want their contributions. It is easy to always ask for more experiments but the problem is, as this blog shows, those experiments can sometimes cost millions of dollars. This also isn't to say that small labs and academics aren't able to publish, but rather that 1) we want them to be able to publish __without__ the support of large corporations to preserve the independence of research[0], 2) we don't want these smaller entities to have to go through a roulette wheel in an effort to get published.

Instead, when reviewing be cautious in what you ask for. You can __always__ ask for more experiments, datasets, "novelty", and so on. Instead ask if what's presented is sufficient to push forward the field in any way and when requesting the previous things be specific as to why what's in the paper doesn't answer what's needed and what experiment would answer it (a sentence or two would suffice).

If not, then we'll have the death of the GPU poor and that will be the death of a lot of innovation, because the truth is, not even big companies will allocate large compute for research that is lower level (do you think state space models (mamba) started with multimillion dollar compute? Transformers?). We gotta start somewhere and all papers can be torn to shreds/are easy to critique. But you can be highly critical of a paper and that paper can still push knowledge forward.

[0] Lots of papers these days are indistinguishable from ads. A lot of papers these days are products. I've even had works rejected because they are being evaluated as products not being evaluated on the merits of their research. Though this can be difficult to distinguish when evaluation is simply empirical.

[1] I once got desk rejected for "prior submission." 2 months later they overturned it, realizing it was in fact an arxiv paper, for only a month later for it to be desk rejected again for "not citing relevant materials" with no further explanation.

pas1y ago

How does preregistration factor into this? It seems to me that it would make sense to do at least some review before the bulk of the money is spent.

> But you can be highly critical of a paper and that paper can still push knowledge forward.

Can you give a concrete example of this?

floor_1y ago· 1 in thread

Content aside. This is hands down my favorite blog format.

mostthingsweb1y ago

I agree, but I'm curious if it's for the same reason. I like it because there is now flowery writing. Just direct "here are the facts".

dont_forget_me1y ago· 1 in thread

All that compute power just to invade privacy and show people more ads. Can this get anymore depressing?

psychoslave1y ago

Yes, sure! Imagine a world where every HN thread you engage in is fed with information that are all subtly tailored to push you into buying whatever crap the market is able to produce.

brg1y ago

I found this exercise interesting, and as arcade79 pointed out it is the cost of replication not the cost to Google. Humorously I wonder the cost of of replicating Higgs-Boson verification or Gravity Wave detection would be.

hiddencost1y ago

It's likely the cost of the researchers was about $1m/ head, with 11 names that puts the staffing costs on par with the compute costs.

(A good rule of thumb is that an employee costs about twice their total compensation.)

j / k navigate · click thread line to collapse

150 comments

81 comments · 13 top-level

BartjeD1y ago· 19 in thread

If this ran on google's own cloud it amounts to internal bookkeeping. The only cost is then the electricity and used capacity. Not consumer pricing. So negligible.

It is rather unfortunate that this sort of paper is hard to reproduce.

They chose to publish. So they are interested in seeing it reproduced or improved upon.

rrr_oh_man1y ago

> They chose to publish. So they are interested in seeing it reproduced or improved upon.

Call me cynical, but this is not what I experienced to be the #1 reason of publishing AI papers.

ash-ali1y ago

I hope someone could share their insight on this comment. I think the other comments are fragile and don't hold too strongly.

2 more replies

echoangle1y ago

As someone not in the AI space, what do you think is the reason for publishing? Marketing and hype for your products?

1 more reply

stairlane1y ago

> The only cost is then the electricity and used capacity. Not consumer pricing. So negligible.

I don’t think this is valid, as this point seems to ignore the fact that the data center that this compute took place in required a massive investment.

A paper like this is more akin to HEPP research. Nobody has the capability to reproduce the higgs results outside of at the facility the research was conducted within (CERN).

I don’t think reproduction was a concern of the researchers.

morbia1y ago

Obviously the 'best' result would be to have a separate collider as well, but no one is going to fund a new collider just to reaffirm the result for a third time.

1 more reply

Rastonbury1y ago

Kinda but Google sells compute so it makes money off the data centre investment, assuming they had spare capacity for this it's negligible at Google scale

rty321y ago

Opportunity cost is cost. What you could have earned by selling the resources to customers instead of using them yourself is what the resources are worth.

g15jv2dp1y ago

This assumes that you can sell 100% of the resources' availability 100% of the time. Whenever you have more capacity that you can sell, there's no opportunity cost in using it yourself.

2 more replies

nkrisc1y ago

Not if you’re only using the resources when they’re available because no customer has paid to use them.

K0balt1y ago

I think Google produces their own power, so they don’t pay distribution cost which is at least one third of the price of power, even higher for large customers.

Cthulhu_1y ago

pintxo1y ago

> They chose to publish. So they are interested in seeing it reproduced or improved upon.

Not necessarily, publishing also ensure that the stuff is no longer patentable.

slashdave1y ago

Forgive me if I am wrong, but all of the techniques explored are already well known. So, what is going to be patented?

2 more replies

jfengel1y ago

Is the electricity cost negligible? It's a pretty compute intensive application.

Of course it would be a tiny fraction of the $10m figure here, but even 1% would be $100,000. Negligible to Google, but for Google even $10 million is couch cushion money.

dekhn1y ago

stavros1y ago

1 more reply

K0balt1y ago

I’d imagine publishing is more oriented toward attracting and retaining talent. You need to scratch that itch or the academics will jump ship.

ape41y ago

Its like them running SETI@home ;)

dekhn1y ago

We ran Folding@Home at google. we were effectively the largest single contributor of cycles for at least a year. It wasn't scientifically worthwhile, so we shut it down after a couple years.

That was using idle cycles on Intel CPUs, not GPUs or TPUs though.

arcade791y ago· 15 in thread

A lot of misunderstandings among the commenters here.

From the link: "the total compute cost it would take to replicate the paper"

It's not Google's cost. Google's cost is of course entirely different. It's the cost for the author if he were to rent the resources to replicate the paper.

bombcar1y ago

This is the side effect of underutilized capital and it’s present in many cases.

For example, if YOU want to rent a backhoe to do some yard rearrangement it’s going to cost you.

But Bob who owns BackHoesInc has them sitting around all the time when they’re not being rented or used; he can rearrange his yard wholesale or almost free.

thaumasiotes1y ago

> This is the side effect of underutilized capital and it’s present in many cases.

5 more replies

mikepurvis1y ago

1 more reply

dweekly1y ago

Possible corollary: it may be difficult to regularly turn out highly compute-dependent research if you're paying full retail rack rates for your hardware (i.e. using someone else's cloud).

punnerud1y ago

Can others also buy the “best effort” tier?

If the job could easily run for weeks, even when you could buy your way for doing it in a day.

Then have a bidding on this “best effort” resource, where they factor in electricity at any given time

curt151y ago

Is the "best effort" tier similar to AWS spot instances?

2 more replies

v3ss0n1y ago

Sure,.land a job there, work the way all up against the cooperate bs and toxicity and you can get best effort tier.

Those effort needs to be added in the cost calculation too.

v3ss0n1y ago

Sure,.land a job there, work the way all up against the cooperate bs and toxicity and you can get best effort tier.

Those effort needs to be added in the cost calculation too

imtringued1y ago

The problem with neoclassical economics is that it doesn't concern itself with the physical counterpart of liquidity. It is assumed that the physical world is just as liquid as the monetary world.

jopsen1y ago

> I would even go as far as to say that the production of liquidity is probably the driving force of the economy.

Isn't that all what high frequency traders would say? :)

Perhaps there is some limit at which additional liquidity doesn't offer much value?

1 more reply

152334HOP1y ago

Is it free-priority based?

rldjbpin1y ago

if this is the way they pull it off consistently, it might be a good business models for those working on research like stability to also moonlight a gpu cloud service.

it is a hustle only for the near future while this bubble lasts, but can help reduce costs.

huijzer1y ago

Still, don’t get high on your own supply.

mrazomor1y ago

This assumes the common resources (CPU, RAM, etc.), not the ones required for the LLM training (GPU, TPU, etc.). It's different economy.

TL; DR: It's not ~free.

akutlay1y ago

Why does GPU matter? Do you think GCP keeps GPU utilization at 100% at all times?

2 more replies

rgmerk1y ago· 7 in thread

Worth pointing out here that in other scientific domains, papers routinely require hundreds of thousands of dollars, sometimes millions of dollars, of resources to produce.

ramraj071y ago

esperent1y ago

To be fair, supposing the Google paper took six months to a year to produce, it also must have cost several hundred thousand dollars in salaries and other non-compute costs.

paxys1y ago

These are mostly fixed costs. If you produce a hundred papers from the same team and same research, the costs aren't 100x.

3 more replies

dumb12241y ago

Well not everyone starts experiment anew. Many also reuse accumulated datasets. For human data even more so.

slashdave1y ago

I assure you that the companies performing these screens expect a return on this investment. It is not for a journal paper.

godelski1y ago

1 more reply

Metacelsus1y ago

Yeah, I'm a wet-lab biologist and my most recent paper (which is still not past peer review) has already cost about $200,000. And I just spent another $2000 today...

sigmoid101y ago· 7 in thread

c-linkage1y ago

Reproducibility is a key element of the scientific process

How is anyone else going to reproduce the experiment if it's going to cost them $10 million because they don't work at Google and would have to rent the infrastructure?

Sebb7671y ago

But what's the solution here? Not doing the (possibly) interesting research because it's hard to reproduce? That doesn't sound like a better situation.

tokai1y ago

Cheap compared to some high energy physic experiments.

1 more reply

rvnx1y ago

It works and helps to get a salary raise or a better job, so they continue.

A bit like when someone goes to a job interview, didn't do anything, and claims "My work is under NDA".

injuly1y ago

> This is calculation is pretty pointless and the title is flat out wrong.

No, it's not. The author clearly states in the very first paragraph that this is the price it would take them to reproduce the results.

Nowhere in the article (or the title) have they implied that this is how much Google spent.

sigmoid101y ago

michaelmior1y ago

Even if they did use H100s and paid the current premium on them, you could probably buy 100 H100s and the boxes to put them in for less than $10M.

pama1y ago· 6 in thread

152334HOP1y ago

You are correct on true H100 ownership costs being far lower. As I mention in the H100 blurb, the H100 numbers are fungible and I don't mind if you halve them.

pama1y ago

lonk111y ago

I think the commenter was thinking about the input embedding layer, where to get an input token embedding the model does a lookup of the embedding by index, which is constant time.

It would be great to clarify this in the blog post to make it more accessible but I understand that there is a tradeoff.

tedivm1y ago

spi1y ago

Do you have sources for "The MFU can be above 40% and certainly well above the 35 % in the estimate"?

[1] https://www.databricks.com/blog/mosaicbert

pama1y ago

Please look at any of the plain pytorch codes by Karpathy that complement llm.c. If you want scalable codes, please look at Megatron-LM.

hnthr_w_y1y ago· 5 in thread

that's not very much in the business range, it's a lot when it comes to paying us salaries.

willis9361y ago

Any company of any size that doesn't learn the right lessons from a $10M mistake will be out of business before long.

brainwad1y ago

That's like staffing a single-manager team on a bad project for a year. Which I assure you happens all the time in big companies, and yet they survive.

1 more reply

vishnugupta1y ago

https://killedbygoogle.com/

I’m confident each one of them were multiple of $10M investments.

And this is just what we know because they were launched publicly.

1 more reply

willis9361y ago

OtherShrezzing1y ago

I'm not really certain that's true at Google's size. Their annual revenue is something like a quarter trillion dollars. 25,000x larger than a $10m mistake.

The equivalent wastage for a self-employed person would be allowing a few cups of Starbucks coffee per year to go cold.

1 more reply

jeffbee1y ago· 4 in thread

I think if you wanted to think about a big expense you'd look at AlphaStar.

5kg1y ago

I am wondering if AlphaStar is the most expensive paper ever.

jeffbee1y ago

1 more reply

lern_too_spel1y ago

"Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC"

ipsum21y ago

It's disappointing that they never developed AlphaStar enough to become super-human (unlike AlphaGo), even lower level players were able to adapt to its playstyle.

The cost was probably the limiting factor.

faitswulff1y ago· 2 in thread

I wonder how many tons of CO2 that amounts to. Google Gemini estimated 125,000 tons of carbon emissions, but I don’t have the know-how to double check it.

chazeon1y ago

If you use solar energy, then there is no CO2 emission. Right?

ipsum21y ago

Google buys carbon credits to make up for CO2 emissions, they've never relied strictly on solar.

godelski1y ago· 1 in thread

pas1y ago

How does preregistration factor into this? It seems to me that it would make sense to do at least some review before the bulk of the money is spent.

> But you can be highly critical of a paper and that paper can still push knowledge forward.

Can you give a concrete example of this?

floor_1y ago· 1 in thread

Content aside. This is hands down my favorite blog format.

mostthingsweb1y ago

I agree, but I'm curious if it's for the same reason. I like it because there is now flowery writing. Just direct "here are the facts".

dont_forget_me1y ago· 1 in thread

All that compute power just to invade privacy and show people more ads. Can this get anymore depressing?

psychoslave1y ago

Yes, sure! Imagine a world where every HN thread you engage in is fed with information that are all subtly tailored to push you into buying whatever crap the market is able to produce.

brg1y ago

hiddencost1y ago

It's likely the cost of the researchers was about $1m/ head, with 11 names that puts the staffing costs on par with the compute costs.

(A good rule of thumb is that an employee costs about twice their total compensation.)

j / k navigate · click thread line to collapse