A Go champion might have trained for 8 hours a day, for 15 years (age 5 to 20). That is about 40 000 hours.
In other words, machines required 137 times longer to learn the game, and at twice the power consumption! There is still a lot of room for improvement.
"AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves".
There are for example, other NNs also being trained to play Go, should all unsuccessful attempts be counted into the machine total? The comparison is almost impossible then.
>KataGo's latest run used about 29 GPUs, rather than thousands (like AlphaZero and ELF), first reached superhuman levels on that hardware in perhaps just three to six days, and reached strength similar to ELF in about 14 days. With minor adjustments and a few more GPUs, starting around 40 days it roughly began to match or surpass Leela Zero in some tests with different configurations, time controls, and hardware. And finally after about four months of training time, the current run may be wrapping up fairly soon, but we hope to be able to continue it or begin another run in the future.
This comparison is a bit unfair. Humans are the result of evolution on a grand scale. Human Go is the result of millennia of gameplay. A human does not become grand master in isolation.
AlphaGo is the result of an evolutionary tournament style competition of a much smaller duration and breadth. AG is also a population, not just one agent, and it would be silly to take just one agent and evaluate it on its own as if it could be created without the others.
Should we include the human costs as well in AG, why just the electricity and CPU?
For example, I expect that the training required to go from 7-year-old child to Go grand master requires a completely different number of bits of information than the training required to go from blanks-late NN to NN Go Grand master. I also suspect that the difference in what is being learned may well dominate the difference in training efficiency. Both the prior knowledge and the mechanism of learning are so different that I doubt you could get a meaningful comparison based on current understanding.
You should remember that we have no idea basically how human beings actually learn things, and no idea how much prior knowledge we have encoded. Just for an example, I once saw a documentary that claimed chess grandmasters seem to recognize valid chess positions using the parts of the brain that usually recognize faces. Assuming that was true (I'm not claiming it is) perhaps a part of their chess learning consisted in taking a built-in face recognizing NN and training it to recognize chess boards. How much did the built-in knowledge of recognizing faces help? I don't think it would be possible to calculate.
Now, the bot has many advantages. It never sleeps, never gets distracted, never dies and can be copied to another system to obtain a copy of the bot with the same playing performance.
The bot is also more accessible. Any player now can train with a bot, all day if you want, for almost free. You cannot do that with a professional.
If you would ask someone to learn Go, but only present him rules of the game, he'll likely be weak player (although probably with some original strategies).
Lots of people contribute what I imagine are amounts of CPU Power/money to the Leela Chess Zero project[1].
Would love to see Alpha Chess vs Leela Chess.
[1] https://training.lczero.org/
[edit] I've caused terrible confusion by melding Leela Go and Leela Chess when Leela Chess was originally forked from Leela Go and that's basically when similarities end.
Edited for a bit more clarity.
This is also how Stockfish got to be the #1 engine. By being open source, and having the testing framework (https://tests.stockfishchess.org) use donated computer time from volunteers, it was able to make fast, continuous progress. It flipped what was previously a disadvantage (if you are open source, everyone can copy your ideas), into an advantage - as you can't easily set up a fishtest like system with an engine that isn't already developed in public.
Perhaps you were confused because Leela Chess Zero was forked from Leela Zero (neural network Go engine by Pascutto) but it includes Stockfish's move generation logic.
https://github.com/official-stockfish/Stockfish/graphs/contr...
https://github.com/LeelaChessZero/lczero/graphs/contributors
Glaurung was pretty innovative at the time.
But Kasparov and others have given up on the idea that a human provides any unique insight into chess anymore. Computers are just better.
"In terms of actual cost to DeepMind (a subsidiary of Google’s parent company) to run the experiment, there are other factors that need to be taken into account, such as researcher salaries, or that the quoted TPU rate probably includes a healthy amount of margin. But for someone outside Google, this number is a good ballpark estimate of how much it would cost to replicate this experiment."
Minor improvements to Google Books OCR might not be worth much, whereas better search result scoring would be worth lots. An automated system would decide where it was most efficient to spend the TPU's. Management would set how many dollars a 10% performance improvement was worth.
I'm sure the reality is a bunch of middle managers arguing over why their team deserves them more than another.
That's a short sighted, immediate benefit or bust mentality. Not to mention that projects have a ramp-up time where they are not profitable yet, but still very valuable strategically.
I don't know how much less, but if you were to do fully pre-emptible at this scale I wouldn't be surprised if you could get it down to one-tenth the price. I wouldn't suspect the same of other more generic resources like CPUs that have a much lower price point to begin with, but the TPU sticker price seems very high with lots of headroom.
I've also heard rumors that AlphaStar (https://deepmind.com/blog/article/alphastar-mastering-real-t...) was essentially put on hold because it was too expensive to improve/train. The bot wasn't able to beat StarCraft champions and _only_ got to a grandmaster level.
(Sure, u need to somehow roll forward and rollback the StarCraft world, but for Atari using MCTS was shown to be an order of magnitude more efficient )
I have also seen comments that the search width is too large, or maybe academic purity consideration?
At the last Blizzcon they had it around. The setup wasn't ideal, so Serral (won world finals in 2018, reached semifinals in 2019) wasn't really happy with how he played, but it won
This was also a version where they'd worked on preventing its ability to micro at quadruple digit apm
I'd even argue that they missed their goal by a long shot if their system isn't able to play arbitrary maps - every human player can do that no problem.
We don't run on those fancy V100 cards though, just regular old gaming cards suffice, and I suppose if we bought the "industrial" nvidia versions it would a take a bit longer to recoup, but still definitely within the year.
Anyway what I'm saying is that it's probably possible to to this a lot cheaper than 36M, though maybe not in such a short time. Our startup is extremely cash intensive, and I bet machine learning companies are as well (I suppose machine learning experts aren't cheap ;)), so if we can put in some work and safe a big portion off our hardware costs that really goes the distance.
How modern startups start out: Spend $50,000/month to run hundreds of microservices on a managed Kubernetes cluster
I remember my last chat with one of such guys. They insisted that the company wasn't up-to-date because we didn't run our app inside containers and didn't develop our own AI/ML systems...
I bet the much higher cost was the PR team, including the film team, press support, TV team, travels, inviting the expert Go players, building the stage, and such. Estimated 100.000.
Not counting the man hours, they were just doing their normal job.
Because renting them out generates no revenue, right?
"Maybe around 20.000"
At least the article used a formula for the calculation. You just picked a number at random.
And the title was "His much did it cost" not how much it would cost.
And in comparison to large tech company R&D budgets, the amount cited in the article is a drop in the bucket. Consider the fact that Google spent $26 billion in R&D budget in 2019 alone [2]. Microsoft spent almost $17 billion [3].
[1] https://www.extremetech.com/computing/76552-project-deep-bli...
[2] https://www.statista.com/statistics/507858/alphabet-google-r...
[3] https://www.statista.com/statistics/267806/expenditure-on-re...
For others: It's $36M.
Also nobody mentioned the title is inaccurate so I guess it's just pedantic "thou shalt not change zhe title" rather than "title was misleading/clickbait"...
"non-trivial" is a bit of a red herring here. Playing go is pretty trivial compared to something like walking or scratching your face. Winning go may be non-trivial compared to those in some ways but it is very trivial in comparison in other ways.
Scratching a face is a matter of fine motor control. [1] is an example from 2011 which did this, as well as face shaving.
Walking is slightly tricky because it's such a dynamic system, but is now human level[2], and there was never really any question that it would be possible.
On the other hand, the state of the art in Go systems before Alpha Go (the one trained off games, not Alpha Zero) couldn't beat competent amateurs. No one had really considered the learn-from-zero-knowledge approach of Alpha Zero even for easier games like chess.
[1] https://www.engadget.com/2011-07-14-robots-for-humanity-help...
I've been interested in the application of AlphaZero to chess. It's sad that this many resources were devoted to something which we can't even use to play chess as of now. Leela (the open source reengineer) is really strong, but the crushing results presented in the AlphaZero paper never materialized. And this article just shows how hard they are to replicate.
It seems to me that, if you only take it as a marketing operation, it has been already very valuable.
I know some companies are doing that, but I think looking at AlphaGo or AGZ and making it go faster should be an interesting problem in itself.
Their running cost estimate of a single TPU in a machine with 4 "TPUs" is based off the price of a cloud TPU v2-8, but a v2-8 is actually 4 ASICS on 1 board.
Also, because of the date of publication being around the time v2s were announced, and the fact that the TPU is only used for inference and GPU is used for training, I think self play was likely done on TPU v1s, which use 5x less power per ASIC and so are likely much cheaper
I also think the way they calculated the number of TPUs required is wrong, it looks like they assume 1 machine with 4 TPUs makes 1 move in 0.4 seconds, but since making 1 move only requires a forwards pass through a moderately sized CNN with 19x19(tiny) input, 1 TPU should be able to make thousands of moves in parallel per second.
> Over 72 hours, 4.9 million matches were played.
One of this claim must be incorrect or misinterpreted, I highly doubt they used so many TPU's as the article claims. That would be not only impractical but also it would raise a lot of other issues like networking, disk speed... etc...
My statement is not against this article, if anyone can confirm they used so many TPUs in parallel feel free to post it
Playing 4.9 million matches of ~100 plies each at 0.4 seconds per ply is 196000000 seconds.
That's < 1000 TPUs. Sounds big but not too-large-for-google big. But other comments here say that the 0.4 second number is also wrong (and in fact significantly lower).
Google isn't operating with that cost, unless we assume that they are prioritizing AlphaGo to the point where they lose such customers 100% of the time.
It's way more likely that AlphaGo is trained on spare time, the cost for the hardware is sunk anyway, so only the cost for upkeep is real.
Not quite, power is quite expensive and basically all modern computers use far less power at idle than going full bore saturated with multiply-add instructions and perfect memory streaming.
Having said that, I agree that there is a substantial cost efficiency gain if they can schedule it during periods of inactivity.
"AlphaGo Zero showed the world that it is possible to build systems to teach themselves to do complicated tasks."
It didn't do any such thing. The game of go has a huge number of potential moves and outcomes, but the rules themselves are trivial, the board position can be measured in a handful of bytes and gameplay always and only progresses in one direction. And judging a good vs bad outcome is just a matter of comparing two numbers.
Go is challenging and interesting for humans, but it's not remotely as "complicated" as driving a car or translating a language.
Given the experiment lasts for just days, this actually sounds pretty impressive I think.
Many humans studied the game for a big portion of their lives in order to get Go knowledge where it is.
However, if you want to reliably make an AI the best in the world at a range of complicated tasks, can you reasonably expect this to be cheap?
>The power consumption of the experiment is equivalent to 12,760 human brains running continuously.
But the problem is this "brains" unit on AlphaZero doesn't seems to take into account of GPU, CPU and Memory involved. It only took the TPU numbers.
Then there is another problem.
> a TPU consumes about 40 watts,[1]
The TPU referred to was a first Gen TPU built on 28nm running at 40W, more like a proof of concept. Currently Google is with Cloud TPU v3 [2], The latest-generation Cloud TPU v3 Pods are liquid-cooled for maximum performance. And each TPU v3 is actually a four chip module. [3]. If a single chip is 100W that is 400W per TPU.
Edit: Turns out Wiki list TPU v3 as 250W. [4]. Not sure if that is 250W per chip or 250W for 4 Chips.
That is on the assumption they are very high powered and hence would require liquid cooling. Although that might not always be the case.
So adding CPU, GPU, Memory, and TPU figures. That original estimate of 12,760 human brains may be off by a factor of 10 if not more.
Still pretty impressive. Considering we now only get about 1.8x improvement with each generation node. We would get about 19x by 2030. ( Assuming the same algorithm ). Which means AI is good, but human brain on its own is still very much magical in its efficiency :)
Correct me If I am wrong on the numbers.
My other questions is, that was how much energy it used to learn Go. But what about energy it used during the Game?
How would AlphaGo Zero perform if it was limited to 20W?
[1] https://cloud.google.com/blog/products/gcp/an-in-depth-look-...
[2] https://cloud.google.com/blog/products/ai-machine-learning/g...
[3] https://techcrunch.com/2019/05/07/googles-newest-cloud-tpu-p...
thanks