DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker (opens in new tab)

(arxiv.org)

102 pointsmaurycy9y ago47 comments

47 comments

> DeepStack becomes the first computer program to beat professional poker players in heads-up no-limit Texas hold'em

Whether any others have been made before now is anyone's guess. Botting is a known problem in online poker. If there's a golden goose out there, I'm sure it's being kept under wraps.

zitterbewegung9y ago

You can collude multiple bots or perform other tasks which could make the botting problem in Texas Holdem not equivalent to the same achievement that they present in the paper.

femto1139y ago

I believe the primary botting "problem" is not rule breaking activity like collusion but the farming of lower-skill players at lower limits than a professional would be willing to play at. A bot will happily rake in a 1x big-blind/hour advantage that a comparably skilled human would consider a complete waste of time. It's my understanding that the real state of the art here is not in the play algorithms (existing bots are more than good enough to beat weaker players) but in avoiding detection by both human and automated monitors.

bcassedy9y ago

Correct. When I last played for income in 2012, the site I played on had 1-2 bots at virtually every table from the $10 buyin cash games all the way up to the $200 buyin games. Around the time I stopped playing it came out that there was a botting ring that had been winning at $1000 buyin games for some time.

Most of their income does come from the weaker players at the table, but many of these bots were good enough to breakeven or do slightly better against the pros at the table too.

1 more reply

pdog9y ago

The big games are typically heads up to avoid the collusion problem.

gallerdude9y ago

I always think the same thing about neural nets on the stock market.

osti9y ago

To be fair, none of the so called pros are considered big names in today's no limit heads-up games. They should probably challenge ppl like WCGRider, Jungleman etc. next.

On another point, CMU just can't seem to catch a break, their thunder continuously being stolen by UofAlberta in poker research, first in limit, now no limit. UofA clearly tried to publish this before the CMU poker challenge that's supposed to begin soon.

To read more about the CMU challenge http://www.cmu.edu/news/stories/archives/2017/january/poker-...

grizzles9y ago

Doug Polk is WCGRider.

dsp12349y ago

Doug Polk is not one of the professionals who was part of this study. The list is in Table 1 of the paper.

osti9y ago

I'm aware lol. Im a fan of his videos.

natecarroll9y ago

The players they recruited were incentivized by a $8,000 prize pool up for grabs among the 34 of them...average $EV $235. They have to play 3000 hands to get a shot at that money, which is probably around 10 hours of multitabling. So that's ~$24/hr in expectation.

And then of course you don't get anything unless you're one of the top three winners against the bot, so there's likely nothing to be gained from grinding out a marginal victory. You should just go ahead and play kinda stupid/aggro and hope you win some of the big flips and whatnot. There's literally nothing at stake for you except time value, so you might as well flame out early and then quit or run up a big stake to give yourself a shot at top 3.

Basically, the study design ensures the bot faces off against weak players playing in a way that would be sub-optimal in any other situation. Not surprised the bot won by a decent margin, nor that they are trying to spin this real hard in advance of the CMU poker bot matchup next week, which will be much more rigorous.

lawn9y ago

I think I can wrap my head around neural nets being superior at games with perfect information like chess or go. But how would you teach bluffing and randomness to a neural net?

bcassedy9y ago

Former poker pro here.

Top professionals are building their strategy around game theory. They'll attempt to play in such a way that they aren't exploitable and look to deviate when they've spotted a weakness in their opponent's play.

Basically, the game theory optimal strategy is unexploitable. In every situation, the best you can do is break even by also playing the optimal strategy. If you deviate from optimal strategy, the optimal strategy will beat you, but it's possible that a strategy tailored to taking advantage of your specific deviations would beat you more quickly.

Unexploitable play typically means that you bet a size with a range of holdings that would make your opponent indifferent to all of his options (And the converse is true when facing a bet). For humans, this means that they gravitate to a few standard bet sizes, while a computer could, in theory, balance their range with much more granularity.

Last I read, for training the neural net it'll play billions+ hands against versions of itself designed to exploit various weaknesses. It'll start out by performing random actions, for example, say it'll have a 33% chance to call your bet, raise, or fold. It then starts to see that it does better when it raises your bet with the nuts and also as a bluff. Eventually, it arrives at an equilibrium strategy.

Since computers are much better at randomness than humans are, they're able to more effectively play these types of strategies and with more complexity of bet sizing. There is what's called a mixed strategy, a strategy where given a situation with the same hole cards you will call, raise, or fold to a bet with some non-zero probability. Doing that as a human is very difficult, but it's something computers manage to do quite easily.

IgorPartola9y ago

Since you are here, I have a few questions about your former job.

First, how does one become a pro poker player?

Second, does it work like a sport where you get paid from sponsorships, or do you just directly take home what you win? Or a combination of both?

Third, is this something that you can do part-time, or does it require full time attention?

Fourth, why did you quit?

bcassedy9y ago

Sure happy to answer questions.

Note that professional in this context just means made a living through poker. There are a ton of people like myself that earned a good living playing online or in casinos. A small minority received long-term sponsorships.

1. I started playing in high school and college. Like everything else I do I dove in to get better and eventually I was good enough that I was making good money playing online. At that point I started to pursue things full time.

2. For me it was the latter. The very top players in skill and visibility are typically the ones getting sponsorships. Win a big tournament, win at the highest level of online games, or make a tv appearance at a final table and you'll find opportunities for sponsorships. For most players, these sponsorships really just add stability to their income with the bulk still coming from winnings. Though there are the poker personalities who make the bulk of their income from sponsorships and TV deals.

3. You can definitely do it part time at some levels, but to keep up it does require quite a bit of study time to win at meaningful stakes.

4. When the US legal landscape changed in 2011, getting your money out of the sites that were still willing to serve Americans got more challenging. Moving out of the country wasn't an option for me. With the game getting harder all the time due to the proliferation of good strategy material and botting, it seemed like it was time to move on.

1 more reply

jat8509y ago

Not who you're replying to, but I can give a few answers, having also played professionally for some time.

1. Lots of means this can happen. "Professional poker player" is generally taken to mean "derives primary source of income from playing poker" or sometimes "spends the majority of their time playing poker". There's no exact qualification.

2. Not generally sponsorship driven, although there are some modes of sponsorship that do factor in in some ways - the primary means of income is basically by winning money from other players. A secondary component is often rakeback, in online play. In cash games, you can join and leave at any time and your winnings or losses are simply the amount you are up or down in that particular session. In tourmanent play there is a payout structure based on your placement in the tournament (often something like 25% of the total prize pool to 1st, 15% to second, etc.)

3. It can be done part time. There is nothing to say that your bankroll can't be seeded or supplemented by external means, and you can play cash games for short or long periods. Tournaments are typically long (at least if you remain in them a long time, though you can be eliminated at any time, basically). I used to play approximately 40 hours a week (since I was treating it like a job), but now I play 10-15 and it represents about 30-40% of my yearly income.

4. I personally quit because I found the stress associated with having it as the sole means of providing myself too overwhelming. One can have a stretch of negative earnings that can last hours, days, weeks, even months - and it can be psychologically damaging in some ways. I also found that I preferred to keep it as a hobby than a means of income, since I enjoyed it more that way.

1 more reply

tgb9y ago

Is it known that there is an optimal strategy?

osti9y ago

In heads up ie. 1 on 1, there is Nash equilibrium, but in multiplayer game there isn't because the other players can coolude against you.

bcassedy9y ago

We know that there exists an optimal strategy, but that we still aren't close to achieving it. It's a zero sum game where both parties have the same lack of information and the betting order rotates. I think it has to have an optimal strategy.

jspiral9y ago

At higher levels poker is about game theory, for example, the player bluffs at an optimal frequency in a certain situation so as to be indifferent to whether the opponent calls or folds.

Exploitative strategies, based on understanding opponent weaknesses and tendencies will win $ at a higher rate, but are themselves exploitable.

For example, almost never bluffing and playing only strong cards crushes beginners who play too many hands and call too much.

This strategy is easily beaten though by stealing most pots and then not paying off the infrequent big bets (strong hands don't come often enough).

A "perfect" game theory strategy is like armor, slowly bleeding the opponent every time they deviate from perfection themselves.

not sure if that helps but maybe some seeds to google at least

jdmichal9y ago

> This strategy is easily beaten though by stealing most pots and then not paying off the infrequent big bets (strong hands don't come often enough).

To dig deeper:

Or you can try actively punishing the big hands by folding out early. Of course, that strategy opens you up to being bled by your opponent bluffing strong hands. Attempting to actively punish the big hands here is a deviation. This is what jspiral means by "deviations from perfection".

falcolas9y ago

I imagine it's mostly just playing the percentages. Bet when it has a high percentage of winning, fold when it doesn't. It doesn't need to read its opponents if it can play the percentages perfectly.

ska9y ago

No, this is a terrible idea. If you play like this consistently, you are basically telling your opponents when to play against you and when to get out of the way. They can even pick bet sizes (assuming no limit) to refine your possible hands very accurately.

jdmichal9y ago

I'm usually surprised in tournaments by the number of people willing to play with me after I've sat quietly folding for the first three rounds... By all means, everyone should fold and give me the blinds, but there always seems to be a player or two who bite!

splike9y ago

This is a very naive interpretation of the game of poker. Professional human players are already very good at calculating the percentages, and any joe playing from his computer has access to a calculator.

The reason why simply playing the numbers fails is that if I know an opponent is playing this way, I'll just fold every time he decides to play.

reverend_gonzo9y ago

This is true for Limit Hold'em, but very much not true for No Limit. Limit Hold'em is a solved game, because as long as you're playing the odds, you can play perfectly. No Limit changes things because the bets can vary wildly. If you play a tight game (just play the odds), and opponent will get out whenever you're in, and will bluff just to see if you call or fold.

Bluffing is a major component in No Limit, and there are very different profitable playing strategies.

jdmichal9y ago

I fell to this when I randomly decided to play some limit hold'em one night. Kept losing to a guy that chased every chance at odds he could, because I couldn't make bets big enough to scare him out. Lesson learned!

osti9y ago

That's not true at all for limit, the perfect bot's bluff frequency is probably higher than most humans. Play against it yourself here http://poker.srv.ualberta.ca

tomarr9y ago

I don't think this is true? At least for small blinds. If you know your opponent is playing the percentages you could heighten your threshold.

ChuckMcM9y ago

I love it, research that pays for itself :-) I think of poker and other card games as imperfect but predictable information. So while you don't know what cards the other players have you can certainly estimate the likelyhood of what they have and prune your choices that way. Think single deck card counting in Blackjack.

esseti9y ago

the fact that they used hearts and spades instead of number for affilition is just lovely.

philosopheer9y ago

most people (including here on HN) are complete n00bs when it comes to understanding how poker is played and how computers can play it, so just to straighten y'all out at the git-go here:

computers are better at bluffing and randomness than humans are. Bluffing is an important optimizing strategy in playing poker well, and it entails tracking the expected value of a pot (which includes cost expectations, don't forget) and it entails randomness, necessary to obfuscate patterns of betting that could give away evidence of your bluffing strategy. Like chess and go, we may not be "there" yet with computers, but n00bs need to understand the theory.

What computers can't do is read "tells", so if you are a master poker player via tells (whether it's unconscious or conscious thinking on your part) then you will beat other humans better than a computer will; but, by the same token, the computer will not give you tells to read nor be fooled by your fake tells. I think the mistake in thinking newbies (even highly experienced ones) make is mixing together "the psychology" of the game with the mathematics of the game.

So to give an oversimplified concrete example of a poker bluffing strategy (inspired by Nesmith Ankeny's book), if odds of you drawing one of the cards you need to win a showdown are 1 out of 4 but the expected payoff is 20x then you not only need to stay in purely on expected value, but it is also an optimal time to bluff if you don't get your card. It is informationally better to have a bluffing strategy that masquerades as an "I have good cards" strategy and gives random information after the showdown rather than "bluffing" being something you do sheerly when you have shit cards. And to enforce a random strategy on yourself, he recommends using a system of the cards in your hand as the random number generator to tell you whether to bluff or not: as you can see, his strategy designed for human players is more perfectly implemented by a computer.

feral9y ago

No - If the only thing computers couldn't beat humans at was reading tells, they'd win online poker.

But they don't yet do that: this paper is about beating humans at heads up, which is a much more limited domain than a full table.

If you want to learn about why to bluff I'd recommend reading about using game theory to solve Kuhn poker.

philosopheer9y ago

online poker has the tremendous flaw that collusion between players is the most optimum strategy, and there is just noooo way to stop it. Collaborating poker-bots who outsource their peppy poker chatter to Bangalore (your feedback is important to them!) will soon be running all the tables if they aren't already. No, they'll never be the champions, because that suboptimal strategy would lead to discovery, but as a giant grist milling farm grinding out profit, seems irresistable.

I did a quick google review of Kuhn poker and I don't see how any of that would not benefit from the understanding I was attempting to convey in my initial post.

6nf9y ago

Collusion is avoided entirely by playing heads up only.

jpolitz9y ago

Does this imply that a pro may well do better in a multiplayer game with mixed humans and machines (by using "tells" to build up a bigger stack from the humans' inaccuracies), than in heads up against a machine?

jdmichal9y ago

Players can and will target others as "easier" and selectively get in fights with them. If nothing else, you'll certainly avoid getting into fights with a player that continuously beats you.

geofft9y ago

As an actual complete n00b, in online poker, how are tells communicated when you can't see someone's facial expressions or body language? My guess is the dollar value of bets, and the timing side channel?

jdmichal9y ago

Bet amount certainly plays a big part. This is what philosopheer means when he talks about "the mathematics of the game". Bet amount is part of that mathematical part; it's basically a signal of your confidence against the current size of the pot. This is why, when you do reading, many strategies will talk about bet amounts as multipliers of the current pot.

Timing can be informative, but it's actually weaker online than in person. In person, you know whether the person is physically present, and can generally gauge when they're paying attention also. Online, taking a long time could simply mean that they're not paying attention. (I've watched streams on Twitch of pro players working multiple tables online.)

brador9y ago

Heads-up is solvable by just crunching the known probabilities, so i'm not sure what the achievement is here. Maybe the complexity of work involved to build the program is worthy of merit? Not sure.

j / k navigate · click thread line to collapse

47 comments

MikeTV9y ago

> DeepStack becomes the first computer program to beat professional poker players in heads-up no-limit Texas hold'em

Whether any others have been made before now is anyone's guess. Botting is a known problem in online poker. If there's a golden goose out there, I'm sure it's being kept under wraps.

zitterbewegung9y ago

You can collude multiple bots or perform other tasks which could make the botting problem in Texas Holdem not equivalent to the same achievement that they present in the paper.

femto1139y ago

bcassedy9y ago

Most of their income does come from the weaker players at the table, but many of these bots were good enough to breakeven or do slightly better against the pros at the table too.

1 more reply

pdog9y ago

The big games are typically heads up to avoid the collusion problem.

gallerdude9y ago

I always think the same thing about neural nets on the stock market.

osti9y ago

To be fair, none of the so called pros are considered big names in today's no limit heads-up games. They should probably challenge ppl like WCGRider, Jungleman etc. next.

To read more about the CMU challenge http://www.cmu.edu/news/stories/archives/2017/january/poker-...

grizzles9y ago

Doug Polk is WCGRider.

dsp12349y ago

Doug Polk is not one of the professionals who was part of this study. The list is in Table 1 of the paper.

osti9y ago

I'm aware lol. Im a fan of his videos.

natecarroll9y ago

lawn9y ago

I think I can wrap my head around neural nets being superior at games with perfect information like chess or go. But how would you teach bluffing and randomness to a neural net?

bcassedy9y ago

Former poker pro here.

IgorPartola9y ago

Since you are here, I have a few questions about your former job.

First, how does one become a pro poker player?

Second, does it work like a sport where you get paid from sponsorships, or do you just directly take home what you win? Or a combination of both?

Third, is this something that you can do part-time, or does it require full time attention?

Fourth, why did you quit?

bcassedy9y ago

Sure happy to answer questions.

3. You can definitely do it part time at some levels, but to keep up it does require quite a bit of study time to win at meaningful stakes.

1 more reply

jat8509y ago

Not who you're replying to, but I can give a few answers, having also played professionally for some time.

1 more reply

tgb9y ago

Is it known that there is an optimal strategy?

osti9y ago

In heads up ie. 1 on 1, there is Nash equilibrium, but in multiplayer game there isn't because the other players can coolude against you.

bcassedy9y ago

jspiral9y ago

At higher levels poker is about game theory, for example, the player bluffs at an optimal frequency in a certain situation so as to be indifferent to whether the opponent calls or folds.

Exploitative strategies, based on understanding opponent weaknesses and tendencies will win $ at a higher rate, but are themselves exploitable.

For example, almost never bluffing and playing only strong cards crushes beginners who play too many hands and call too much.

This strategy is easily beaten though by stealing most pots and then not paying off the infrequent big bets (strong hands don't come often enough).

A "perfect" game theory strategy is like armor, slowly bleeding the opponent every time they deviate from perfection themselves.

not sure if that helps but maybe some seeds to google at least

jdmichal9y ago

> This strategy is easily beaten though by stealing most pots and then not paying off the infrequent big bets (strong hands don't come often enough).

To dig deeper:

falcolas9y ago

I imagine it's mostly just playing the percentages. Bet when it has a high percentage of winning, fold when it doesn't. It doesn't need to read its opponents if it can play the percentages perfectly.

ska9y ago

jdmichal9y ago

splike9y ago

The reason why simply playing the numbers fails is that if I know an opponent is playing this way, I'll just fold every time he decides to play.

reverend_gonzo9y ago

Bluffing is a major component in No Limit, and there are very different profitable playing strategies.

jdmichal9y ago

osti9y ago

That's not true at all for limit, the perfect bot's bluff frequency is probably higher than most humans. Play against it yourself here http://poker.srv.ualberta.ca

tomarr9y ago

I don't think this is true? At least for small blinds. If you know your opponent is playing the percentages you could heighten your threshold.

ChuckMcM9y ago

esseti9y ago

the fact that they used hearts and spades instead of number for affilition is just lovely.

philosopheer9y ago

most people (including here on HN) are complete n00bs when it comes to understanding how poker is played and how computers can play it, so just to straighten y'all out at the git-go here:

feral9y ago

No - If the only thing computers couldn't beat humans at was reading tells, they'd win online poker.

But they don't yet do that: this paper is about beating humans at heads up, which is a much more limited domain than a full table.

If you want to learn about why to bluff I'd recommend reading about using game theory to solve Kuhn poker.

philosopheer9y ago

I did a quick google review of Kuhn poker and I don't see how any of that would not benefit from the understanding I was attempting to convey in my initial post.

6nf9y ago

Collusion is avoided entirely by playing heads up only.

jpolitz9y ago

jdmichal9y ago

Players can and will target others as "easier" and selectively get in fights with them. If nothing else, you'll certainly avoid getting into fights with a player that continuously beats you.

geofft9y ago

jdmichal9y ago

brador9y ago

Heads-up is solvable by just crunching the known probabilities, so i'm not sure what the achievement is here. Maybe the complexity of work involved to build the program is worthy of merit? Not sure.

j / k navigate · click thread line to collapse