That's a $30Mil if you want to train at that scale. Also IIRC it took 23 days to train the biggest model. Someone else can do the power consumption cost calculations.
A100 cards consume 250w each, with datacenter overheads we will call it 1000 kilowatts for all 2048 cards. 23 days is 552 hours, or 552,000 kilowatt hours total.
Most dataceneters are between 7 and 10 cents per kilowatt hour for electricity. Some are below 4. At 10 cents, that's $53,000 in electricity costs, which is nothing next to $30 million in capital costs.
I believe capex <> opex is more 1:1 nowadays, so something feels off here...
You mean in terms of money. I think this is exactly the problem that we have in CS, nobody really cares about CO2.
In any case the cost per run is going to be lower than 30m
Edit: todays pricing looks like about 20% higher, still. How are these prices so different.
If an A100 costs $15k and is useful for 3 years, that’s $5k/year, $425/mo. 2048 A100’s cost $870k for a month.
One way you can distill the first three is to use AWS/Azure/GCP costs. But then you are still missing a major factor which is the humans that worked on it, and the human may very well exceed the hardware cost.
At least $10 million/yr just for the talent.
A100 costs $2/h, so it is $2M to train biggest model. Easily kikstart crowdfundable project.