How much did AlphaGo Zero cost? (2018) (opens in new tab)

(yuzeh.com)

214 pointshamsterbooster6y ago175 comments

175 comments

101 comments · 30 top-level

hjnilsson6y ago· 18 in thread

Another way of thinking about how efficient the brain is: By the article’s numbers, about 5.5 million TPU hours were required to train the machine to play as well as a Go champion.

A Go champion might have trained for 8 hours a day, for 15 years (age 5 to 20). That is about 40 000 hours.

In other words, machines required 137 times longer to learn the game, and at twice the power consumption! There is still a lot of room for improvement.

FartyMcFarter6y ago

Go champions don't learn from zero. They learn from teachers, books, and playing against each other. This knowledge is built over hundreds, or thousands of years.

simonh6y ago

Alphago didn't learn from zero either. It has a pre-processor that identifies sets of patterns with known features, and also:

"AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves".

2 more replies

hjnilsson6y ago

Yes! So perhaps one way to make the machine more efficient, is by one of pre-programmed “general” models, that can be attuned to a particular problem in a much shorter time?

29athrowaway6y ago

And long collaborative study sessions.

phreeza6y ago

This comparison is not entirely fair because the human brain also benefits from priors baked in over the entire course of evolution.

andbberger6y ago

That's a pretty big claim. One could argue that the topology of the brain is a prior, analogous to the architecture of a neural net. But considering that we really have no idea how learning happens in the brain on a large scale, you really can't say.

5 more replies

darepublic6y ago

Yeah only the last few human layers needed to be trained for the GO expansion pack, all the early layers were frozen during GO training.

ben_w6y ago

OTOH, I expect that avoiding human evolutionary priors is necessary for superhuman performance.

Drakim6y ago

So did the machine, albeit indirectly.

Scarblac6y ago

But there are also many other people spending time studying Go who didn't reach that level. We ran all that studying in parallel and then selected the best person by running a world championship. You can't only count his effort alone.

hjnilsson6y ago

True. But that single brain, in that person was that efficient. And represents the theoretical gap in efficiency to the machine.

There are for example, other NNs also being trained to play Go, should all unsuccessful attempts be counted into the machine total? The comparison is almost impossible then.

superkuh6y ago

https://github.com/lightvector/KataGo

>KataGo's latest run used about 29 GPUs, rather than thousands (like AlphaZero and ELF), first reached superhuman levels on that hardware in perhaps just three to six days, and reached strength similar to ELF in about 14 days. With minor adjustments and a few more GPUs, starting around 40 days it roughly began to match or surpass Leela Zero in some tests with different configurations, time controls, and hardware. And finally after about four months of training time, the current run may be wrapping up fairly soon, but we hope to be able to continue it or begin another run in the future.

visarga6y ago

> In other words, machines required 137 times longer to learn the game, and at twice the power consumption!

This comparison is a bit unfair. Humans are the result of evolution on a grand scale. Human Go is the result of millennia of gameplay. A human does not become grand master in isolation.

AlphaGo is the result of an evolutionary tournament style competition of a much smaller duration and breadth. AG is also a population, not just one agent, and it would be silly to take just one agent and evaluate it on its own as if it could be created without the others.

Should we include the human costs as well in AG, why just the electricity and CPU?

sukilot6y ago

AlphaGo is the result of human evolution too.

1 more reply

elcritch6y ago

It'd be really interesting if a research group could calculate an entropic calculation on how efficient training any given neural network would be. As in what is the thermodynamic limit of the most optimal NN training could be in terms of watts per bit trained. My hunch would be that human brains would operate close to this limit. At least in our standard environmental conditions. Based on how near optimal biomaterials are in terms of strength to weight ratios it wouldn't surprise me much.

simiones6y ago

I think the problem you'd find is that "bit trained" is probably highly non-trivial.

For example, I expect that the training required to go from 7-year-old child to Go grand master requires a completely different number of bits of information than the training required to go from blanks-late NN to NN Go Grand master. I also suspect that the difference in what is being learned may well dominate the difference in training efficiency. Both the prior knowledge and the mechanism of learning are so different that I doubt you could get a meaningful comparison based on current understanding.

You should remember that we have no idea basically how human beings actually learn things, and no idea how much prior knowledge we have encoded. Just for an example, I once saw a documentary that claimed chess grandmasters seem to recognize valid chess positions using the parts of the brain that usually recognize faces. Assuming that was true (I'm not claiming it is) perhaps a part of their chess learning consisted in taking a built-in face recognizing NN and training it to recognize chess boards. How much did the built-in knowledge of recognizing faces help? I don't think it would be possible to calculate.

2 more replies

29athrowaway6y ago

Rather than Go champion I would rather use the term Go professional. There is a difference between being a professional and winning professional tournaments.

Now, the bot has many advantages. It never sleeps, never gets distracted, never dies and can be copied to another system to obtain a copy of the bot with the same playing performance.

The bot is also more accessible. Any player now can train with a bot, all day if you want, for almost free. You cannot do that with a professional.

vbezhenar6y ago

Human does not learn Go from the scratch on himself. He's using teachers, books which present compressed knowledge which was crystallized from many millions of human hours.

If you would ask someone to learn Go, but only present him rules of the game, he'll likely be weak player (although probably with some original strategies).

irjustin6y ago· 12 in thread

Alpha Go Zero inspired the development of an open source version, Leela Go Zero which Leela Chess Zero is forked from by the same guy who made Stock Fish.

Lots of people contribute what I imagine are amounts of CPU Power/money to the Leela Chess Zero project[1].

Would love to see Alpha Chess vs Leela Chess.

[1] https://training.lczero.org/

[edit] I've caused terrible confusion by melding Leela Go and Leela Chess when Leela Chess was originally forked from Leela Go and that's basically when similarities end.

Edited for a bit more clarity.

glinscott6y ago

The great thing about these community driven efforts is that it is indeed feasible to reproduce these super expensive efforts. I'm a bystander now, as new maintainers have taken over, and they are doing a fantastic job pushing things forward.

This is also how Stockfish got to be the #1 engine. By being open source, and having the testing framework (https://tests.stockfishchess.org) use donated computer time from volunteers, it was able to make fast, continuous progress. It flipped what was previously a disadvantage (if you are open source, everyone can copy your ideas), into an advantage - as you can't easily set up a fishtest like system with an engine that isn't already developed in public.

29athrowaway6y ago

I think KataGo is stronger than Leela Zero.

https://github.com/lightvector/KataGo

thomasahle6y ago

I was suspecting another boring clone, but Kata looks like a cool project with nice new ideas! Thanks for sharing

2 more replies

Intermernet6y ago

I'd love to see some CPU/GPU/TPU donations to @chewxy's "agogo"[1] which is AlphaGo (AlphaZero) re-implemented in Go as a sort of proof of concept / demo of Gorgonia[2].

[1]: https://github.com/gorgonia/agogo

[2]: https://github.com/gorgonia/gorgonia

bonzini6y ago

As far as I know Gian-Carlo Pascutto is not among the original authors of Stockfish, though he did work on chess engines.

Perhaps you were confused because Leela Chess Zero was forked from Leela Zero (neural network Go engine by Pascutto) but it includes Stockfish's move generation logic.

est316y ago

I think OP was referring to Gary Linscott who has made major contributions to both Stockfish and made an adaptation of Leela Zero to Chess, now living under the Leela Chess Zero Github org but apparently a different adaptation is now the officially sanctioned one, at least his commits don't show up in the new lc0 repo.

https://github.com/official-stockfish/Stockfish/graphs/contr...

https://github.com/LeelaChessZero/lczero/graphs/contributors

jacquesm6y ago

He's also well known for his absolutely amazing work on audio fingerprinting.

nABHSZQPQ6y ago

There are several people who wrote Stockfish, including Tord Romstad, whose Glaurung was the initial base for Stockfish.

Glaurung was pretty innovative at the time.

mikorym6y ago

AFAIK Garry Kasparov to this day does computer&human vs. computer&human chess research, and it's far from a solved problem.

nl6y ago

If by "it's far from a solved problem" you mean that chess isn't a solved game that's true.

But Kasparov and others have given up on the idea that a human provides any unique insight into chess anymore. Computers are just better.

2 more replies

blueboo6y ago

He recently spoke quite dismissively of computer- augmented chess on the Lex Friedman podcast. Essentially, the computer knows best...so computer and human isn’t meaningfully different from computer and rubber stamper.

2 more replies

air76y ago

Is human&computer better than computer only?

5 more replies

conistonwater6y ago· 8 in thread

Are they using the on-demand price instead of the preemptible price? It seems like the sort of job that can run on preemptible machines, just because it's a batch job. Also, should the cost really be calculated using public market prices at all, as opposed to the running costs of the TPUs? It is not guaranteed at all that the opportunity cost to Google of using all those TPUs is equal to the price that you or I would pay Google to use them. I understand it cost a lot, but I'm not convinced by the headline figure of $36M.

Certhas6y ago

The article actually addresses that. The precise number is not the point but the ballpark is:

"In terms of actual cost to DeepMind (a subsidiary of Google’s parent company) to run the experiment, there are other factors that need to be taken into account, such as researcher salaries, or that the quoted TPU rate probably includes a healthy amount of margin. But for someone outside Google, this number is a good ballpark estimate of how much it would cost to replicate this experiment."

conistonwater6y ago

KataGo and Leela Zero and all the other AIs certainly didn't cost that much (the people running them wouldn't have had that much money and resources) and are probably stronger than Alpha Go Zero. I don't think it's at all fair to say this number is a good ballpark estimate of how much it would cost to replicate this experiment. It's wrong as a calculation of Google's costs, it's wrong per the title How much did AlphaGo Zero cost, and it's also wrong as an estimate of the cost of replication.

2 more replies

sauercloud6y ago

True, but "giving a ballpark estimate to replicate the experiment" is quite different to "How much did it cost"

londons_explore6y ago

One could imagine an 'ideal' allocation of TPU's, where compute time is allocated to the project that earns the most dollars per FLOP.

Minor improvements to Google Books OCR might not be worth much, whereas better search result scoring would be worth lots. An automated system would decide where it was most efficient to spend the TPU's. Management would set how many dollars a 10% performance improvement was worth.

I'm sure the reality is a bunch of middle managers arguing over why their team deserves them more than another.

visarga6y ago

> One could imagine an 'ideal' allocation of TPU's, where compute time is allocated to the project that earns the most dollars per FLOP.

That's a short sighted, immediate benefit or bust mentality. Not to mention that projects have a ramp-up time where they are not profitable yet, but still very valuable strategically.

tpetry6y ago

You mean better ad revenue, because search results are getting worse and worse. So search results can't really be google's primary focus anymore.

CydeWeys6y ago

Like you, I suspect there are substantial bulk discounts available if you're using tens of millions of dollars worth of these.

I don't know how much less, but if you were to do fully pre-emptible at this scale I wouldn't be surprised if you could get it down to one-tenth the price. I wouldn't suspect the same of other more generic resources like CPUs that have a much lower price point to begin with, but the TPU sticker price seems very high with lots of headroom.

dmarchand906y ago

For me I think it's very interesting just to get a ballpark order-of-magnitude estimate. I'm sure the cost isn't much less than $3.6M and so the underlying story doesn't change that much, i.e., this is not something accessible to hobbyists.

ipsum26y ago· 8 in thread

Alpha Go Zero*, which was trained from scratch, without human games.

I've also heard rumors that AlphaStar (https://deepmind.com/blog/article/alphastar-mastering-real-t...) was essentially put on hold because it was too expensive to improve/train. The bot wasn't able to beat StarCraft champions and _only_ got to a grandmaster level.

Symmetry6y ago

Alpha Go Zero used a combination of deep learning within a classical AI Monte Carlo simulation framework. Without a similarly effective framework its not surprising that Alpha Star wasn't able to achieve similar success. I've watched a lot of Alpha Star replays and the lapses in forethought glare through pretty regularly, though other aspects of its judgement (not just reflexes!) seem frankly super human.

embrassingstuff6y ago

Why was MCTS (or some search variant) not used in alphastar ?

(Sure, u need to somehow roll forward and rollback the StarCraft world, but for Atari using MCTS was shown to be an order of magnitude more efficient )

I have also seen comments that the search width is too large, or maybe academic purity consideration?

__s6y ago

https://www.youtube.com/watch?v=nbiVbd_CEIA

At the last Blizzcon they had it around. The setup wasn't ideal, so Serral (won world finals in 2018, reached semifinals in 2019) wasn't really happy with how he played, but it won

This was also a version where they'd worked on preventing its ability to micro at quadruple digit apm

ajnin6y ago

That match was still arguably not fair for the human because of the imperfect input and outputs he had to deal with. In the end I'm not sure it makes sense to put artificial handicaps on the machine to leave the human a chance. It's like a swimming race between a fish and some land animal but the fish is not allowed to use its fins. Sure the other guy has a better chance to win, but, what does that measure really?

qayxc6y ago

The main issue is that they need to retrain their bots for each new map, which would mean excessive training every time the map-pool changes (e.g. every 3 months or so IIRC).

I'd even argue that they missed their goal by a long shot if their system isn't able to play arbitrary maps - every human player can do that no problem.

bsaul6y ago

Interesting ! i didn't understand why they stopped the alphastar project. They pretended they reached their goal, but clearly haven't.

NVHacker6y ago

The goal was never to beat all pros. https://deepmind.com/blog/article/AlphaStar-Grandmaster-leve...

1 more reply

empath756y ago

playing at grandmaster level is a pretty astounding achievement.

tinco6y ago· 4 in thread

Our main compute doesn't go towards machine learning, but we do rely heavily on GPU power. I recently had to come up with the figures for us to invest in an expansion of our compute power, and it turned out that buying the machines ourselves would be cheaper than renting them from Google in 3-4 months.

We don't run on those fancy V100 cards though, just regular old gaming cards suffice, and I suppose if we bought the "industrial" nvidia versions it would a take a bit longer to recoup, but still definitely within the year.

Anyway what I'm saying is that it's probably possible to to this a lot cheaper than 36M, though maybe not in such a short time. Our startup is extremely cash intensive, and I bet machine learning companies are as well (I suppose machine learning experts aren't cheap ;)), so if we can put in some work and safe a big portion off our hardware costs that really goes the distance.

gridlockd6y ago

How Google started out: Run servers from a shack, using commodity PC hardware

How modern startups start out: Spend $50,000/month to run hundreds of microservices on a managed Kubernetes cluster

est316y ago

Which of the modern startup founders have gotten into the Stanford PhD program as teenagers? Those people are extremely rare and there is far too much VC floating around to focus only on such talent. Most founders, while still outstanding, are not from that class of person. Also, due to the larger sums of money being thrown around, people can just do those things now and spend their time growing their business instead of optimizing it. In the VC funded environment, optimizing businesses lose out to competitors which grow the fastest.

qayxc6y ago

How else do you attract venture capitalists and convince them to throw millions at you? You need all the buzzwords and the latest tech ;)

I remember my last chat with one of such guys. They insisted that the company wasn't up-to-date because we didn't run our app inside containers and didn't develop our own AI/ML systems...

vegesm6y ago

probably the memory requirements mean that you do need (multiple) V100s though

rurban6y ago· 4 in thread

Misleading numbers, and wrong calculations. The TPU and CPU cost them almost nothing as they use and build them anyway, and renting them out for this PR stunt just cost them the missed rental time, if customers would really pay that much. Maybe around 20.000. Energy cost? I don't see much additional costs as those machines run all the time, regardless if improving the model or doing something else.

I bet the much higher cost was the PR team, including the film team, press support, TV team, travels, inviting the expert Go players, building the stage, and such. Estimated 100.000.

Not counting the man hours, they were just doing their normal job.

eeeficus6y ago

The OP talks about how much would’ve cost a third party to replicate the experiments. That’s what I get from it anyway.

melbourne_mat6y ago

"renting them out for this PR stunt just cost them the missed rental time"

Because renting them out generates no revenue, right?

"Maybe around 20.000"

At least the article used a formula for the calculation. You just picked a number at random.

rurban6y ago

You need actual customers paying that much, not hypothetical ones. That was my educated guess.

And the title was "His much did it cost" not how much it would cost.

has2k16y ago

I think it would be helpful for discourse if you read the article in full.

tech-historian6y ago· 3 in thread

Achieving a new breakthrough in computing is often very expensive. Deep Blue is estimated to cost IBM over $100 million over a decade [1].

And in comparison to large tech company R&D budgets, the amount cited in the article is a drop in the bucket. Consider the fact that Google spent $26 billion in R&D budget in 2019 alone [2]. Microsoft spent almost $17 billion [3].

[1] https://www.extremetech.com/computing/76552-project-deep-bli...

[2] https://www.statista.com/statistics/507858/alphabet-google-r...

[3] https://www.statista.com/statistics/267806/expenditure-on-re...

sukilot6y ago

Note that everything in software development is R&D. Building (not operating) Gmail and Android and Office and Azure are R&D.

ViViDboarder6y ago

Did you mean to say “not everything”? Much may be, but certainly not everything. As you said, operating or maintaining software is not generally considered R&D. Development without the Research component is just Development, not Research and Development.

geodel6y ago

Right. That's the Development part of Research and Development.

2 more replies

trashburger6y ago· 3 in thread

The amount was removed from the submission title, which sucks if you're like me and don't like to visit yet another possibly JS-heavy site and drain your battery.

For others: It's $36M.

ramraj076y ago

Are you working off one solar panel on the way to Mars? First time I'm hearing battery drain as a reason to not visit a js site.

qayxc6y ago

Phones. F'in phones man :D

lucb1e6y ago

I was of half a mind to un-upvote because I liked knowing the number but didn't care enough about the article. Now I don't think it's as much worth it for this to be higher up.

Also nobody mentioned the title is inaccurate so I guess it's just pedantic "thou shalt not change zhe title" rather than "title was misleading/clickbait"...

jaekash6y ago· 3 in thread

> This accomplishment is truly remarkable in that it shows that we can develop systems that teach themselves to do non-trivial tasks from a blank slate, and eventually become better than humans at doing the task.

"non-trivial" is a bit of a red herring here. Playing go is pretty trivial compared to something like walking or scratching your face. Winning go may be non-trivial compared to those in some ways but it is very trivial in comparison in other ways.

nl6y ago

This is... wrong?

Scratching a face is a matter of fine motor control. [1] is an example from 2011 which did this, as well as face shaving.

Walking is slightly tricky because it's such a dynamic system, but is now human level[2], and there was never really any question that it would be possible.

On the other hand, the state of the art in Go systems before Alpha Go (the one trained off games, not Alpha Zero) couldn't beat competent amateurs. No one had really considered the learn-from-zero-knowledge approach of Alpha Zero even for easier games like chess.

[1] https://www.engadget.com/2011-07-14-robots-for-humanity-help...

[2] https://www.youtube.com/watch?v=_sBBaNYex3E

PeterisP6y ago

Playing go at that level is non-trivial compared to walking because (a) most humans can walk, but not even the best human go masters can play go at that level; (b) we had algorithms that allow bipedal robots to walk long before we had algorithms for playing go at that level.

simiones6y ago

Do we have algorithms that allow bipedal robots to walk at human level? Or run at, let's say, 10th grade standard student level?

2 more replies

zucker426y ago· 2 in thread

I wonder if the code and network weights will ever see the light of day. I wonder what the eventual value proposition of working on this sort of stuff is. I suppose they are just going to try to apply the algorithms to better things.

I've been interested in the application of AlphaZero to chess. It's sad that this many resources were devoted to something which we can't even use to play chess as of now. Leela (the open source reengineer) is really strong, but the crushing results presented in the AlphaZero paper never materialized. And this article just shows how hard they are to replicate.

RobertoG6y ago

>>" I wonder what the eventual value proposition of working on this sort of stuff is."

It seems to me that, if you only take it as a marketing operation, it has been already very valuable.

hobofan6y ago

I'd really love to know how big the marketing impact on IBM Watson from this was. Somehow they managed to put the idea that "Watson is the best available AI" so well in the heads of then general population that 50%+ of my non-tech friends somehow thought that Watson beat the Go pros when I was talking with them about the topic when it was in the news.

raverbashing6y ago· 2 in thread

Wondering when researchers will switch from "race to the moon" mode to looking at better optimization techniques instead of just throwing money at the problem.

I know some companies are doing that, but I think looking at AlphaGo or AGZ and making it go faster should be an interesting problem in itself.

sanxiyn6y ago

KataGo optimized AlphaZero and achieved 50x compute reduction.

https://arxiv.org/abs/1902.10565

slx266y ago

Can you share the names of those companies, or some of their projects? At least that way people who thinks like you can follow those efforts and try to give them more visibility.

sorenbouma6y ago· 1 in thread

I might be wrong, but I think this cost calculation is way off:

Their running cost estimate of a single TPU in a machine with 4 "TPUs" is based off the price of a cloud TPU v2-8, but a v2-8 is actually 4 ASICS on 1 board.

Also, because of the date of publication being around the time v2s were announced, and the fact that the TPU is only used for inference and GPU is used for training, I think self play was likely done on TPU v1s, which use 5x less power per ASIC and so are likely much cheaper

I also think the way they calculated the number of TPUs required is wrong, it looks like they assume 1 machine with 4 TPUs makes 1 move in 0.4 seconds, but since making 1 move only requires a forwards pass through a moderately sized CNN with 19x19(tiny) input, 1 TPU should be able to make thousands of moves in parallel per second.

brilee6y ago

Making one move requires 1600 MCTS playouts to explore the game tree, so it's a 1600-1 correspondence of "forward pass" and "move played".

jonplackett6y ago· 1 in thread

It’s a shame that this ‘Next big thing’ is the complete opposite of the internet. Instead of opening up the world for anyone to create things, letting small companies compete with large, it is only going to concentrate power with the richest companies and leave small companies unable to get involved.

dcolkitt6y ago

“Well sure, AlphaZero looks impressive but I predict that within one hundred years, deep learning systems will be twice as powerful, ten thousand times larger, and so expensive that only the five richest kings of Europe will own them.”

Lucasoato6y ago· 1 in thread

> Each move during self-play uses about 0.4 seconds of computer thinking time.

> Over 72 hours, 4.9 million matches were played.

One of this claim must be incorrect or misinterpreted, I highly doubt they used so many TPU's as the article claims. That would be not only impractical but also it would raise a lot of other issues like networking, disk speed... etc...

My statement is not against this article, if anyone can confirm they used so many TPUs in parallel feel free to post it

MauranKilom6y ago

72 hours are 259200 seconds.

Playing 4.9 million matches of ~100 plies each at 0.4 seconds per ply is 196000000 seconds.

That's < 1000 TPUs. Sounds big but not too-large-for-google big. But other comments here say that the 0.4 second number is also wrong (and in fact significantly lower).

gridlockd6y ago· 1 in thread

It is estimated to be 36 million for someone else to train AlphaGo Zero, assuming they use Google TPU instances and pay the sticker price.

Google isn't operating with that cost, unless we assume that they are prioritizing AlphaGo to the point where they lose such customers 100% of the time.

It's way more likely that AlphaGo is trained on spare time, the cost for the hardware is sunk anyway, so only the cost for upkeep is real.

pixelpoet6y ago

> It's way more likely that AlphaGo is trained on spare time, the cost for the hardware is sunk anyway, so only the cost for upkeep is real.

Not quite, power is quite expensive and basically all modern computers use far less power at idle than going full bore saturated with multiply-add instructions and perfect memory streaming.

Having said that, I agree that there is a substantial cost efficiency gain if they can schedule it during periods of inactivity.

skywhopper6y ago

I'll quibble with a little bit of this.

"AlphaGo Zero showed the world that it is possible to build systems to teach themselves to do complicated tasks."

It didn't do any such thing. The game of go has a huge number of potential moves and outcomes, but the rules themselves are trivial, the board position can be measured in a handful of bytes and gameplay always and only progresses in one direction. And judging a good vs bad outcome is just a matter of comparing two numbers.

Go is challenging and interesting for humans, but it's not remotely as "complicated" as driving a car or translating a language.

NVHacker6y ago

The (synchronous) 0.4s per move number is misleading (and wrong), that's not what the paper is saying. The "footnote 1" of the article is wrong.

vadarvariu6y ago

Now, consider that this is the cost of the final model reported in the paper. This doesn't account for all the iterations of trying out e.g. different model architectures, hyperparameter sweeps, etc. The true cost of the experimentation is likely at least an order of magnitude higher.

ggm6y ago

Does Sarbanes Oxley apply to zero rating ML costs? Alpha go might have unfair kyu ranking, if Google don't have to "pay" to acquire rank. (95% joking)

FartyMcFarter6y ago

> The power consumption of the experiment is equivalent to 12,760 human brains running continuously.

Given the experiment lasts for just days, this actually sounds pretty impressive I think.

Many humans studied the game for a big portion of their lives in order to get Go knowledge where it is.

amelius6y ago

I'd like to see an AI play Monopoly (the board game) against CEOs of large companies.

antris6y ago

I wonder if there's some kind of software that takes an advantage of an AI to teach non-beginner players Go. E.g. you could play against the bot and then the AI would translate your mistakes into what you can improve upon.

phonebucket6y ago

This is a lot of money.

However, if you want to reliably make an AI the best in the world at a range of complicated tasks, can you reasonably expect this to be cheap?

ksec6y ago

The interesting part to me, rather than cost, is Energy usage.

>The power consumption of the experiment is equivalent to 12,760 human brains running continuously.

But the problem is this "brains" unit on AlphaZero doesn't seems to take into account of GPU, CPU and Memory involved. It only took the TPU numbers.

Then there is another problem.

> a TPU consumes about 40 watts,[1]

The TPU referred to was a first Gen TPU built on 28nm running at 40W, more like a proof of concept. Currently Google is with Cloud TPU v3 [2], The latest-generation Cloud TPU v3 Pods are liquid-cooled for maximum performance. And each TPU v3 is actually a four chip module. [3]. If a single chip is 100W that is 400W per TPU.

Edit: Turns out Wiki list TPU v3 as 250W. [4]. Not sure if that is 250W per chip or 250W for 4 Chips.

That is on the assumption they are very high powered and hence would require liquid cooling. Although that might not always be the case.

So adding CPU, GPU, Memory, and TPU figures. That original estimate of 12,760 human brains may be off by a factor of 10 if not more.

Still pretty impressive. Considering we now only get about 1.8x improvement with each generation node. We would get about 19x by 2030. ( Assuming the same algorithm ). Which means AI is good, but human brain on its own is still very much magical in its efficiency :)

Correct me If I am wrong on the numbers.

My other questions is, that was how much energy it used to learn Go. But what about energy it used during the Game?

How would AlphaGo Zero perform if it was limited to 20W?

[1] https://cloud.google.com/blog/products/gcp/an-in-depth-look-...

[2] https://cloud.google.com/blog/products/ai-machine-learning/g...

[3] https://techcrunch.com/2019/05/07/googles-newest-cloud-tpu-p...

[4] https://en.wikipedia.org/wiki/Tensor_processing_unit

Kronen6y ago

I would be more interested in how much LCZero cost?

seb3146y ago

the article doesn't seem to consider cost of hyperparameter optimization prior to the final training...

magwa1016y ago

Ahem, "one time cost"

angel_j6y ago

A: $400M—to acquire DeepMind

lihaciudaniel6y ago

You need 35$m to beat the best Stockfish engine which can work on a small computer. Who won?

justplay6y ago

sorry if it is off topic but I want to learn alpha zero from beginning , I do have little understanding of Machine & deep learning including vision recognition. Unfortunately I don't able to understand how monto Carlo tree is used for decision making. where I can start, what shall I learn so that I can learn alpha go (or OpenAI Five - Dota 2 bit).

thanks

j / k navigate · click thread line to collapse

175 comments

101 comments · 30 top-level

hjnilsson6y ago· 18 in thread

Another way of thinking about how efficient the brain is: By the article’s numbers, about 5.5 million TPU hours were required to train the machine to play as well as a Go champion.

A Go champion might have trained for 8 hours a day, for 15 years (age 5 to 20). That is about 40 000 hours.

In other words, machines required 137 times longer to learn the game, and at twice the power consumption! There is still a lot of room for improvement.

FartyMcFarter6y ago

Go champions don't learn from zero. They learn from teachers, books, and playing against each other. This knowledge is built over hundreds, or thousands of years.

simonh6y ago

Alphago didn't learn from zero either. It has a pre-processor that identifies sets of patterns with known features, and also:

"AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves".

2 more replies

hjnilsson6y ago

Yes! So perhaps one way to make the machine more efficient, is by one of pre-programmed “general” models, that can be attuned to a particular problem in a much shorter time?

29athrowaway6y ago

And long collaborative study sessions.

phreeza6y ago

This comparison is not entirely fair because the human brain also benefits from priors baked in over the entire course of evolution.

andbberger6y ago

5 more replies

darepublic6y ago

Yeah only the last few human layers needed to be trained for the GO expansion pack, all the early layers were frozen during GO training.

ben_w6y ago

OTOH, I expect that avoiding human evolutionary priors is necessary for superhuman performance.

Drakim6y ago

So did the machine, albeit indirectly.

Scarblac6y ago

hjnilsson6y ago

True. But that single brain, in that person was that efficient. And represents the theoretical gap in efficiency to the machine.

There are for example, other NNs also being trained to play Go, should all unsuccessful attempts be counted into the machine total? The comparison is almost impossible then.

superkuh6y ago

https://github.com/lightvector/KataGo

visarga6y ago

> In other words, machines required 137 times longer to learn the game, and at twice the power consumption!

This comparison is a bit unfair. Humans are the result of evolution on a grand scale. Human Go is the result of millennia of gameplay. A human does not become grand master in isolation.

Should we include the human costs as well in AG, why just the electricity and CPU?

sukilot6y ago

AlphaGo is the result of human evolution too.

1 more reply

elcritch6y ago

simiones6y ago

I think the problem you'd find is that "bit trained" is probably highly non-trivial.

2 more replies

29athrowaway6y ago

Rather than Go champion I would rather use the term Go professional. There is a difference between being a professional and winning professional tournaments.

Now, the bot has many advantages. It never sleeps, never gets distracted, never dies and can be copied to another system to obtain a copy of the bot with the same playing performance.

The bot is also more accessible. Any player now can train with a bot, all day if you want, for almost free. You cannot do that with a professional.

vbezhenar6y ago

Human does not learn Go from the scratch on himself. He's using teachers, books which present compressed knowledge which was crystallized from many millions of human hours.

If you would ask someone to learn Go, but only present him rules of the game, he'll likely be weak player (although probably with some original strategies).

irjustin6y ago· 12 in thread

Alpha Go Zero inspired the development of an open source version, Leela Go Zero which Leela Chess Zero is forked from by the same guy who made Stock Fish.

Lots of people contribute what I imagine are amounts of CPU Power/money to the Leela Chess Zero project[1].

Would love to see Alpha Chess vs Leela Chess.

[1] https://training.lczero.org/

[edit] I've caused terrible confusion by melding Leela Go and Leela Chess when Leela Chess was originally forked from Leela Go and that's basically when similarities end.

Edited for a bit more clarity.

glinscott6y ago

29athrowaway6y ago

I think KataGo is stronger than Leela Zero.

https://github.com/lightvector/KataGo

thomasahle6y ago

I was suspecting another boring clone, but Kata looks like a cool project with nice new ideas! Thanks for sharing

2 more replies

Intermernet6y ago

I'd love to see some CPU/GPU/TPU donations to @chewxy's "agogo"[1] which is AlphaGo (AlphaZero) re-implemented in Go as a sort of proof of concept / demo of Gorgonia[2].

[1]: https://github.com/gorgonia/agogo

[2]: https://github.com/gorgonia/gorgonia

bonzini6y ago

As far as I know Gian-Carlo Pascutto is not among the original authors of Stockfish, though he did work on chess engines.

Perhaps you were confused because Leela Chess Zero was forked from Leela Zero (neural network Go engine by Pascutto) but it includes Stockfish's move generation logic.

est316y ago

https://github.com/official-stockfish/Stockfish/graphs/contr...

https://github.com/LeelaChessZero/lczero/graphs/contributors

jacquesm6y ago

He's also well known for his absolutely amazing work on audio fingerprinting.

nABHSZQPQ6y ago

There are several people who wrote Stockfish, including Tord Romstad, whose Glaurung was the initial base for Stockfish.

Glaurung was pretty innovative at the time.

mikorym6y ago

AFAIK Garry Kasparov to this day does computer&human vs. computer&human chess research, and it's far from a solved problem.

nl6y ago

If by "it's far from a solved problem" you mean that chess isn't a solved game that's true.

But Kasparov and others have given up on the idea that a human provides any unique insight into chess anymore. Computers are just better.

2 more replies

blueboo6y ago

2 more replies

air76y ago

Is human&computer better than computer only?

5 more replies

conistonwater6y ago· 8 in thread

Certhas6y ago

The article actually addresses that. The precise number is not the point but the ballpark is:

conistonwater6y ago

2 more replies

sauercloud6y ago

True, but "giving a ballpark estimate to replicate the experiment" is quite different to "How much did it cost"

londons_explore6y ago

One could imagine an 'ideal' allocation of TPU's, where compute time is allocated to the project that earns the most dollars per FLOP.

I'm sure the reality is a bunch of middle managers arguing over why their team deserves them more than another.

visarga6y ago

> One could imagine an 'ideal' allocation of TPU's, where compute time is allocated to the project that earns the most dollars per FLOP.

That's a short sighted, immediate benefit or bust mentality. Not to mention that projects have a ramp-up time where they are not profitable yet, but still very valuable strategically.

tpetry6y ago

You mean better ad revenue, because search results are getting worse and worse. So search results can't really be google's primary focus anymore.

CydeWeys6y ago

Like you, I suspect there are substantial bulk discounts available if you're using tens of millions of dollars worth of these.

dmarchand906y ago

ipsum26y ago· 8 in thread

Alpha Go Zero*, which was trained from scratch, without human games.

Symmetry6y ago

embrassingstuff6y ago

Why was MCTS (or some search variant) not used in alphastar ?

(Sure, u need to somehow roll forward and rollback the StarCraft world, but for Atari using MCTS was shown to be an order of magnitude more efficient )

I have also seen comments that the search width is too large, or maybe academic purity consideration?

__s6y ago

https://www.youtube.com/watch?v=nbiVbd_CEIA

At the last Blizzcon they had it around. The setup wasn't ideal, so Serral (won world finals in 2018, reached semifinals in 2019) wasn't really happy with how he played, but it won

This was also a version where they'd worked on preventing its ability to micro at quadruple digit apm

ajnin6y ago

qayxc6y ago

The main issue is that they need to retrain their bots for each new map, which would mean excessive training every time the map-pool changes (e.g. every 3 months or so IIRC).

I'd even argue that they missed their goal by a long shot if their system isn't able to play arbitrary maps - every human player can do that no problem.

bsaul6y ago

Interesting ! i didn't understand why they stopped the alphastar project. They pretended they reached their goal, but clearly haven't.

NVHacker6y ago

The goal was never to beat all pros. https://deepmind.com/blog/article/AlphaStar-Grandmaster-leve...

1 more reply

empath756y ago

playing at grandmaster level is a pretty astounding achievement.

tinco6y ago· 4 in thread

gridlockd6y ago

How Google started out: Run servers from a shack, using commodity PC hardware

How modern startups start out: Spend $50,000/month to run hundreds of microservices on a managed Kubernetes cluster

est316y ago

qayxc6y ago

How else do you attract venture capitalists and convince them to throw millions at you? You need all the buzzwords and the latest tech ;)

I remember my last chat with one of such guys. They insisted that the company wasn't up-to-date because we didn't run our app inside containers and didn't develop our own AI/ML systems...

vegesm6y ago

probably the memory requirements mean that you do need (multiple) V100s though

rurban6y ago· 4 in thread

I bet the much higher cost was the PR team, including the film team, press support, TV team, travels, inviting the expert Go players, building the stage, and such. Estimated 100.000.

Not counting the man hours, they were just doing their normal job.

eeeficus6y ago

The OP talks about how much would’ve cost a third party to replicate the experiments. That’s what I get from it anyway.

melbourne_mat6y ago

"renting them out for this PR stunt just cost them the missed rental time"

Because renting them out generates no revenue, right?

"Maybe around 20.000"

At least the article used a formula for the calculation. You just picked a number at random.

rurban6y ago

You need actual customers paying that much, not hypothetical ones. That was my educated guess.

And the title was "His much did it cost" not how much it would cost.

has2k16y ago

I think it would be helpful for discourse if you read the article in full.

tech-historian6y ago· 3 in thread

Achieving a new breakthrough in computing is often very expensive. Deep Blue is estimated to cost IBM over $100 million over a decade [1].

[1] https://www.extremetech.com/computing/76552-project-deep-bli...

[2] https://www.statista.com/statistics/507858/alphabet-google-r...

[3] https://www.statista.com/statistics/267806/expenditure-on-re...

sukilot6y ago

Note that everything in software development is R&D. Building (not operating) Gmail and Android and Office and Azure are R&D.

ViViDboarder6y ago

geodel6y ago

Right. That's the Development part of Research and Development.

2 more replies

trashburger6y ago· 3 in thread

The amount was removed from the submission title, which sucks if you're like me and don't like to visit yet another possibly JS-heavy site and drain your battery.

For others: It's $36M.

ramraj076y ago

Are you working off one solar panel on the way to Mars? First time I'm hearing battery drain as a reason to not visit a js site.

qayxc6y ago

Phones. F'in phones man :D

lucb1e6y ago

I was of half a mind to un-upvote because I liked knowing the number but didn't care enough about the article. Now I don't think it's as much worth it for this to be higher up.

Also nobody mentioned the title is inaccurate so I guess it's just pedantic "thou shalt not change zhe title" rather than "title was misleading/clickbait"...

jaekash6y ago· 3 in thread

nl6y ago

This is... wrong?

Scratching a face is a matter of fine motor control. [1] is an example from 2011 which did this, as well as face shaving.

Walking is slightly tricky because it's such a dynamic system, but is now human level[2], and there was never really any question that it would be possible.

[1] https://www.engadget.com/2011-07-14-robots-for-humanity-help...

[2] https://www.youtube.com/watch?v=_sBBaNYex3E

PeterisP6y ago

simiones6y ago

Do we have algorithms that allow bipedal robots to walk at human level? Or run at, let's say, 10th grade standard student level?

2 more replies

zucker426y ago· 2 in thread

RobertoG6y ago

>>" I wonder what the eventual value proposition of working on this sort of stuff is."

It seems to me that, if you only take it as a marketing operation, it has been already very valuable.

hobofan6y ago

raverbashing6y ago· 2 in thread

Wondering when researchers will switch from "race to the moon" mode to looking at better optimization techniques instead of just throwing money at the problem.

I know some companies are doing that, but I think looking at AlphaGo or AGZ and making it go faster should be an interesting problem in itself.

sanxiyn6y ago

KataGo optimized AlphaZero and achieved 50x compute reduction.

https://arxiv.org/abs/1902.10565

slx266y ago

Can you share the names of those companies, or some of their projects? At least that way people who thinks like you can follow those efforts and try to give them more visibility.

sorenbouma6y ago· 1 in thread

I might be wrong, but I think this cost calculation is way off:

Their running cost estimate of a single TPU in a machine with 4 "TPUs" is based off the price of a cloud TPU v2-8, but a v2-8 is actually 4 ASICS on 1 board.

brilee6y ago

Making one move requires 1600 MCTS playouts to explore the game tree, so it's a 1600-1 correspondence of "forward pass" and "move played".

jonplackett6y ago· 1 in thread

dcolkitt6y ago

Lucasoato6y ago· 1 in thread

> Each move during self-play uses about 0.4 seconds of computer thinking time.

> Over 72 hours, 4.9 million matches were played.

My statement is not against this article, if anyone can confirm they used so many TPUs in parallel feel free to post it

MauranKilom6y ago

72 hours are 259200 seconds.

Playing 4.9 million matches of ~100 plies each at 0.4 seconds per ply is 196000000 seconds.

That's < 1000 TPUs. Sounds big but not too-large-for-google big. But other comments here say that the 0.4 second number is also wrong (and in fact significantly lower).

gridlockd6y ago· 1 in thread

It is estimated to be 36 million for someone else to train AlphaGo Zero, assuming they use Google TPU instances and pay the sticker price.

Google isn't operating with that cost, unless we assume that they are prioritizing AlphaGo to the point where they lose such customers 100% of the time.

It's way more likely that AlphaGo is trained on spare time, the cost for the hardware is sunk anyway, so only the cost for upkeep is real.

pixelpoet6y ago

> It's way more likely that AlphaGo is trained on spare time, the cost for the hardware is sunk anyway, so only the cost for upkeep is real.

Not quite, power is quite expensive and basically all modern computers use far less power at idle than going full bore saturated with multiply-add instructions and perfect memory streaming.

Having said that, I agree that there is a substantial cost efficiency gain if they can schedule it during periods of inactivity.

skywhopper6y ago

I'll quibble with a little bit of this.

"AlphaGo Zero showed the world that it is possible to build systems to teach themselves to do complicated tasks."

Go is challenging and interesting for humans, but it's not remotely as "complicated" as driving a car or translating a language.

NVHacker6y ago

The (synchronous) 0.4s per move number is misleading (and wrong), that's not what the paper is saying. The "footnote 1" of the article is wrong.

vadarvariu6y ago

ggm6y ago

Does Sarbanes Oxley apply to zero rating ML costs? Alpha go might have unfair kyu ranking, if Google don't have to "pay" to acquire rank. (95% joking)

FartyMcFarter6y ago

> The power consumption of the experiment is equivalent to 12,760 human brains running continuously.

Given the experiment lasts for just days, this actually sounds pretty impressive I think.

Many humans studied the game for a big portion of their lives in order to get Go knowledge where it is.

amelius6y ago

I'd like to see an AI play Monopoly (the board game) against CEOs of large companies.

antris6y ago

phonebucket6y ago

This is a lot of money.

However, if you want to reliably make an AI the best in the world at a range of complicated tasks, can you reasonably expect this to be cheap?

ksec6y ago

The interesting part to me, rather than cost, is Energy usage.

>The power consumption of the experiment is equivalent to 12,760 human brains running continuously.

But the problem is this "brains" unit on AlphaZero doesn't seems to take into account of GPU, CPU and Memory involved. It only took the TPU numbers.

Then there is another problem.

> a TPU consumes about 40 watts,[1]

Edit: Turns out Wiki list TPU v3 as 250W. [4]. Not sure if that is 250W per chip or 250W for 4 Chips.

That is on the assumption they are very high powered and hence would require liquid cooling. Although that might not always be the case.

So adding CPU, GPU, Memory, and TPU figures. That original estimate of 12,760 human brains may be off by a factor of 10 if not more.

Correct me If I am wrong on the numbers.

My other questions is, that was how much energy it used to learn Go. But what about energy it used during the Game?

How would AlphaGo Zero perform if it was limited to 20W?

[1] https://cloud.google.com/blog/products/gcp/an-in-depth-look-...

[2] https://cloud.google.com/blog/products/ai-machine-learning/g...

[3] https://techcrunch.com/2019/05/07/googles-newest-cloud-tpu-p...

[4] https://en.wikipedia.org/wiki/Tensor_processing_unit

Kronen6y ago

I would be more interested in how much LCZero cost?

seb3146y ago

the article doesn't seem to consider cost of hyperparameter optimization prior to the final training...

magwa1016y ago

Ahem, "one time cost"

angel_j6y ago

A: $400M—to acquire DeepMind

lihaciudaniel6y ago

You need 35$m to beat the best Stockfish engine which can work on a small computer. Who won?

justplay6y ago

thanks

j / k navigate · click thread line to collapse