Monday, April 30, 2018

Where Game Theory Optimal (GTO) Fails

I realize this will be a controversial post among poker players but it's a discussion I haven't seen around the topic that I believe would benefit everyone.

Disclaimer: I'm not a game theory expert nor do I possess expertise in the vast majority of its related subject matter.

First what is game theory?

Taken from Wikipedia:

"The study of mathematical models of conflict and cooperation between intelligent and rational decision makers."

And what is game theory optimal or the commonly used poker expression GTO? - Several definitions are listed below to illustrate some of the confusion surrounding this phrase.

This isn't so clear.  The origins of the phrase itself aren't clear either.  However, it seems to have originated from poker.  Google Trends shows interest in the phrase peaked in December of 2004, not long after Chris MoneyMaker won the main event of the World Series and the subsequent poker boom.

Here's one of the first definitions given from a "What is GTO poker?" Google search.   Taken from a PokerNews article:

"GTO stands for "game theory optimal".  In poker, this term gets thrown around to signal a few different concepts.  It refers to thoughts about opponent modeling, and thinking about poker situations in terms of ranges and probabilities, as opposed to being strictly results oriented."

Still isn't clear:

Here's another definition taken from a different article:

"It refers to a decision in some particular situation for which an opponent cannot make a profitable counter."

A little clearer...

The best definition I could find, and also the earliest I came across, comes from a 2003 University of Alberta research paper titled, "Approximating Game-Theoretic Optimal Strategies for Full-scale Poker."

http://poker.cs.ualberta.ca/publications/IJCAI03.pdf

It reads, "Of particular interest is the existence of optimal solutions, or Nash equilibria.  An optimal solution provides a randomized strategy, basically a recipe of how to play in each possible situation.  Using this strategy ensures an agent will obtain at the least the game-theoretic value of the game regardless of the opponent's strategy."

Much clearer and seems to be a good solution for solving poker games until the next sentence.

"Unfortunately finding exact optimal solutions is limited to relatively small problem sizes, and is not practical for most domains."

Although this was written in 2003 and significant progress has been made in overcoming the lack of computing power, it remains today as the primary obstacle in solving poker games.

And still today, Nash equilibrium is the logic powering GTO along with two of the most popular GTO solver programs, PioSolver and GTORangeBuilder.

In the developer's words: Piosolver, "calculates optimal strategies, exact value and plays for every situation."  GTORB goes as far as to say, "It's the holy grail of poker."

But do they?  And is it?

A research paper written by the University of Alberta in January of 2017 titled, "Equilibrium Approximation Quality of Current No-Limit Poker Bots" may cast some doubt on the above claims:

http://webdocs.cs.ualberta.ca/~games/poker/publications/aaai17ws-lisy-lbr.pdf

Here's the last sentence taken from the Abstract of that paper that refers to a method UAB developed to evaluate the quality of current bots.

"Using this method, we show that existing poker-playing programs, based on solving abstract games, are remarkably poor Nash equilibrium approximations."

The paper looked at bots that competed in the 2016 Annual Computer Poker Competition (ACPC).  In their words, "These bots are developed by top research teams, use principled AI approaches, and the techniques they use are to large extent well documented."

One of the techniques all bots use is something called abstraction.  Because there isn't enough computing power to handle all the possible permutations such as the flop, turn, river cards, betting sequences, stack sizes, bet sizes, etc. (more on this later), similar things are lumped or bucketed together to put it in the simplest of terms.

However this comes at the cost of accuracy.  And this essentially is what the paper attempts to measure.  The conclusion of this paper... well is a bit shocking:

"Using this method we show that existing poker bots, including the second and the third best performing bots in the ACPC in 2016, all have exploitability substantially larger than folding all hands.  The bots that use card abstraction are losing over 3 big blinds per hand on average against their worst case opponent.  Exploitability can be reduced by not using card abstraction, but that necessarily leads to using a very sparse betting abstraction, which can be heavily exploited as well.  Therefore, we assume that a substantial paradigm shift is necessary to create bots that would closely approximate equilibrium in full no-limit Texas hold'em."

I should stop here and be clear that current online GTO programs like PioSolver and GTORB were not included in this study.  However, to the best of my understanding these programs use levels of abstraction to varying degrees.  And as mentioned above, abstraction leads to reduced accuracy.

If I were a user of these programs, understanding the degree or extent of this accuracy loss would be of concern.  One of many concerns.

Let's take a look at what Nash equilibrium is since this is the engine powering the intelligence behind these programs.


"It's a solution concept of a non-cooperative game involving two or more players in which each player is assumed to know equilibrium strategies of the other players, and no player has anything to gain by changing only their own strategy."

Now that we have some, albeit limited understanding, let's dive into some of my major issues with GTO itself.  We don't have to go further than the definition of Nash equilibrium:

"It's a solution concept of a non-cooperative game involving two or more players in which each player is assumed to know equilibrium strategies of the other players."

The previously linked 2003 UAB research paper also alludes to this assumption when referring to a GTO player:

"An implicit assumption is that the opponent is also playing optimally, and nothing can be gained by observing that opponent for patterns or weaknesses."

When we, human beings that is, play poker, we absolutely can not assume that the other players know and/or are implementing equilibrium strategies.  We can in fact assume the opposite, that no one is playing GTO.  So from the start we're relying on a strategy that's predicated on a false assumption.

The linked Nash equilibrium page from above continues with something called "Occurence":

According to Nash equilibrium if the following conditions are met, then we should adopt the NE strategy.

Sufficient conditions to guarantee that the Nash equilibrium are played are:

     1.  The players all will do their utmost to maximize their expected payoff as described by the 
          game.
     2.  The players are flawless in execution.
     3.  The players have sufficient intelligence to deduce the solution
     4.  The players know the planned equilibrium of all other players
     5.  The players believe that a deviation in their own strategy will not cause deviations by any               other players.
     6.  There is common knowledge that all players meet these conditions, including this one.
          So not only must each player know the other players meet the conditions, but they must
          know that they all know that they meet them, and know that they know that they meet
          them and so on.

Do any of these, much less all of them, ring true in the games you're playing in?  These are the conditions or restraints set forth as to when Nash equilibrium should be adopted.

At this point if you're a GTO proponent you may be saying to yourself, "But we still have a strategy that can't be exploited by another opponent."  I won't argue the validity of this statement except to say I'd be willing to bet it's false due to the aforementioned abstraction issues alone.

Notwithstanding, let's take on the assumption we do have a strategy that can't be exploited by another opponent.  As poker players, our goal is to make the most money possible.  The goal is NOT "to not lose money."

We should set out to maximize the expected value of every decision within the context of all our decisions.  Not to strictly ensure we always have some non-zero positive expected value.  If we only care about positive expected value as GTO does, then it's a virtual certainty we'll fail to maximize that expected value.

There was a recent Twitter poll offered by Olivier Busquet that shows just how much confusion there is around GTO and the above statements.

https://mobile.twitter.com/olivierbusquet/status/966351748028862471

He asks, "If a perfect GTO bot played only live tournaments 25K entry or higher it would be:

-Far and away the best
-Marginally the best
-Among the elite
-A winner but not elite

At the time of this writing the results were 8071 votes with the choices receiving 32%, 17%, 23%, 29% respectively.  That's about as statistically insignificant of a result from a 4 question poll as we'll ever see.

To be fair to the participants, there's inherent confusion with the question itself.  In order to judge whether someone, or something in this case, would be the "best" or "elite", we need to define those terms.  Does "best" mean the person who makes the most money?  Or does best mean that player that is most skilled assuming that could be measured somehow?  Does it mean something else?

Personally, I'd define "best" as having the highest positive monetary expectation.

I do not believe the perfect GTO bot would make the most money.  It would make some money because it would never make a decision that resulted in negative expected value.  But, it wouldn't make the most money because it wouldn't fully exploit the errors of its opponents.  And I'd argue even at this level of play, sizable mistakes are being made with meaningful frequency.

David Sklansky, noted poker author and creator of Twoplustwo poker forums is on record saying the following about Cepheus, the bot that "solved" heads up fixed limit hold'em:

"If the computer is playing a bad player, it will win but it won't win as quickly as a human being playing a bad player."  He then goes on to say, "I will destroy that beginner to a greater degree than this computer program will."

A perfect GTO bot doesn't care about the size of your mistakes.  It by definition assumes it's playing against a group of players that are also playing an unexploitable strategy and therefore plays its unexploitable strategy in response.  Once it's established (incorrectly) the other players are playing flawlessly, Nash equilibrium is adopted, and the other players become irrelevant in a sense.

Again we're back to the issue of the false assumption that our opponents are playing perfectly.

Where the crux of the debate generated from Olivier's poll question lies is in whether best exploitive human being extracts more money from its opponents than the best unexploitable robot extracts from those same opponents.  

There really are two ends of the spectrum here that I think just about everyone would agree on:

1.  A human being will extract more money from a beginner than a robot
2.  A robot will extract more money from an expert player than a human being

So if you agree with the above, there's some unknown place representative of our opponents collective skill level that lies between "beginner" and "expert" that answers Olivier's question.

I have no indisputable proof to answer this question.  I do have a strong opinion after playing over six million hands online.  Tens of thousands of these were played against robots with GTO aspirations.  And hundreds of thousands more against human beings that fell more into the GTO spectrum than the exploitive.  

I think you can guess my opinion.  Perhaps unsurprisingly though, these GTO based opponents were the toughest to play against.  However, they weren't the big winners in the game as evidenced by results on PokerTableRatings and to a lesser extent, the results in my own database.

It's worth noting, we have no idea how many players failed attempting to adopt a GTO strategy relative to how many failed in an attempt to adopt an exploitive strategy.  In other words there's inherent selection bias in only looking at the results of players that amassed a high volume of hands.  And to some degree all players are always playing, or at least striving, to implement some mixture of approximate GTO (balance) and exploitive play.  As it's impossible for a human being to play "perfect" poker.

Let's put all this aside for a moment to discuss what I believe is the most compelling argument against GTO.  Let's assume we don't care about using a strategy that doesn't meet the requirements of the game it was designed for.  Let's also assume we don't care about maximizing expected value but do care about having non-zero positive expected value.

Let's also assume that (insert your favorite GTO program here) *accurately* provides these strategies and they do meet the Nash Equilibrium criteria and have positive expected value.

Note: Accurately is starred above because I'm not sure if there has been third party testing to ensure the accuracy of the output of poker GTO programs on the market.  If I did use them, this would another concern in addition to the aforementioned abstraction issues.

Given these assumptions, how as a human being do we plan on implementing these strategies?

Let's hypothetically take something as seemingly simple as whether we should open T6 offsuit from the button playing limit Hold'em.  We launch our program and input the assumed range of our perfect playing opponent.  It responds by recommending T6o as a profitable open.  So we dutifully open it from the button from that point forward.

There are a host of problems with this even after setting the aforementioned issues aside.

We don't get dealt T6 offsuit every hand, it's one of many hands that get dealt to us.  We always have to think of this hand in the context of our range of opening hands as the program does.  So maybe you're thinking, "Not a problem, I can memorize the actions of every preflop hand that program suggests."

But can you also memorize the ~9 possible betting sequences to take T6o on each of the 17926 flops, 45 turns, and 44 rivers that went into the program calculating the profitability of opening that hand?  If my math is right, that's 25,874,746,920 iterations.  Yes 25 billion.  And remember this is only for T6o offsuit.

Here's the calculation for all hands taken from  https://arxiv.org/pdf/1302.7008.pdf






















That's 319 trillion.  If every person on earth today memorized 10,000 of these possible permutations, we still wouldn't come close to committing them to our collective consciousness.

Essentially what we're seeing when looking at the output of one of these programs is a microscopic view of a drop of water derived from a vast ocean.  We're assuming because we can see say the end result of a shooting star, that we can see the entire universe.  We can't see it, much less understand it.

This short discussion taken from a 2+2 thread I stumbled across a few days ago illustrates the point:

 https://forumserver.twoplustwo.com/53/mid-high-stakes-limit/2018-nc-lc-thread-we-ever-going-get-title-1700128/index2.html

The poster states he's "super confused" when playing against Cepheus.  Cepheus defends Js3s against a button open playing heads up.  The flop is Ac 8h 4s and Cepheus check/calls.

The poster, "can't see how that's possibly in his range" referring to the flop call.

Another poster chimes in with some good information.  "Looks like the bot continues 100% on this flop, check raises 18% with the backdoor outs as a bluff.  When I switch the 4s to the 4h it moves it to a 100% fold.  If I change the Ace to Kc, it again continues 100%, check raising 15% as a bluff."

The poster then points out that Cepheus is check raising 100% of the time on this flop when it holds Ks3s and is understandably surprised at this.

Another poster responds to the fact he's check raising Ks3s 100% and says, "K3s is not a hand you really want to call 3 times with.  Yes better to raise for thin value!"

A simple check of an equity calculator shows this isn't a "value raise" even with the most liberal of assumptions to support the claim.

I could take this haphazard assignment of reason further and surmise Cepheus is combining his equity (derived from the high card value, pair outs, and backdoor draws) with his opponent's potential fold equity gained by raising.  Maybe the 5c comes on the turn and Cepheus gets his opponent to fold 22 or 33 as one example.

The reality is, we have little idea why Cepheus is making these plays and as to why with these specific frequencies.  Sure we can evaluate the strength of his hand in relation to the board and his opponent's range by observing things like he has a backdoor flush draw, backdoor straight draw, and some high card value.  

But why are we check raising Js3s 18% of the time and Ks3s 100% of the time?  Why not 14% and 91%?  Why are we check raising or calling at all?  Why aren't we leading?  Why didn't we 3Bet preflop?  Maybe we did 3Bet preflop 72% of the time an are unknowingly looking at the other 28%.  The questions are numerous and unanswerable.

Only Cepheus who has examined every possible turn and river card in addition to every conceivable betting sequence hammered out via trillions of hands against another presumed omniscient opponent "understands" why it's doing that.

I have to reiterate this point because it's an important one.  This is just one hand in a range of hands.  In fact, it's possible to make plays with negative expected value in a vacuum that increase the overall profitability of all our hands in the same situation.

When I refer to it being "just one hand in a range of hands", I'm talking about this:

Not only all the hands we have in this situation. "Situation" being defined as our range that calls versus a button open.  But, also to all the different action sequences that can arise on the flop, turn, and river.  And this is in addition to all the flop, turn, and river cards themselves.

For example, we'd like to have hands that can bet/3-bet, bet/call, bet/fold, check/call, check raise, check raise/fold, check raise/4-bet, check/fold, etc. etc. on the flop.  And not only the flop, but the turn and river as well.  So we're targeting balance within all the conceivable action sequences on all the conceivable turn and river cards within the context of our entire range.  Even this paragraph is an oversimplification of the immense complexity involved.

You can view a range of hands like the instruments in an orchestra, all working in harmony to produce a beautiful melody.  To take one hand like Js3s out of a range on the flop might be analogous to walking over to the flute player, listening to he or she play a perfect note or two, and concluding you're ready to conduct the symphony.

The answer as to why Cepheus is doing this is unknowable for a human being.  This can be proven at the most basic level because Cepheus is making these plays in response another assumed perfect playing opponent.  To even begin investigating the true reasons behind these plays, we'd have to fully understand how its opponent is playing.

I'd like to end on a bit of a positive note towards GTO.  I'm not opposed to the idea of GTO itself and we should all strive to have a better understanding of it.  In simple terms, we want a firm grasp of balance particularly against more skilled players and in more common situations that arise at a poker table.  And examining GTO programs can certainly help point us in the right direction, particularly in seeking answers to more theoretical questions.  Though I think there are simpler and more effective ways.

I look at balance like the volume knob on a stereo.  The better my opponent and/or the more common the situation, the higher I'll dial the balance knob up. The worse my opponent or less common the situation, the lower the volume.

Granted the volume on my stereo doesn't go nearly as high as say Cepheus' but neither does my opponents.

-Tony Pirone
TPirahna
----------------------------------------------------------------------------------------------
I will make a post in the near future that goes into a lot more detail surrounding
balance and the practical implementation of it.




Monday, April 23, 2018

A Return to Blogging

I'm not sure if anyone reads this blog anymore but there will be more posts in the near future.  I'm back playing live poker and immersing myself in the game.

The posts from here on out will be strategy and psychology related.

Hope you enjoy.

Thursday, February 16, 2017

Coaching!

It's been a long time since I last posted here.  I played Fantasy Sports for about a year and a half, came to the conclusion I could make more playing poker and went back to live poker for about 20 months.

Although both endeavors were successful, it's time to move on to something else.  The live poker environment is unhealthy and chaotic to say the least.  Throughout the years I had mentioned here on more than a few occasions that I didn't know how live poker players did it.  I'm still not sure how they do it.  The drama, the angle-shooting, the general chaos is all something I don't want to be a part of if I don't have to.  I'd like to limit my time to ten to twenty hours a week at this point.

My focus now is on coaching and eventually writing a book, hopefully the best book that's ever been written on LHE.  I'd anticipate this will take 5-10 years to complete but I've begun the journey.

I started a coaching thread on 2+2 about two weeks ago.  So far I'm up to 5 students.  Ideally, I'd like to coach 10 at any given time.  So there are spots open at this point.

All the information and specifics can be found here:

http://forumserver.twoplustwo.com/164/cash-game-poker-coach-listings/tpirahna-limit-holdem-strategic-coach-amp-life-mentor-1651623/

Feel free to ask questions via the thread or email.

I plan on updating this blog more frequently as well.  It'll probably relate to topics being covered in my book along with a new spiritual journey I'm on in life.  More on the latter to follow.

Friday, October 17, 2014

Fantasy Sports Links

I feel a little guilty even posting these but at the same time I think it's kind of stupid not to.  I'm posting links for the various fantasy sites so if anyone that reads my blog wants to sign up and signs up through the links, I'll get a portion of your rakeback.  It's basically the same deal as how affiliates work with poker.

I actually received several offers through the years to post links, banners, etc. in my blog in return for a fee and always refused.  I've never really done the whole affiliate thing but with all the interest in Fantasy sports and all the people contacting me with questions and interest I think it would be dumb not to post them.  So if you are interested in Fantasy Sports, have liked my blog, and would like to give something back to me, the links are below.

I also linked the 2+2 "Well" in case you haven't seen it - you can ask me anything there and I'll be happy to respond.  Also I linked a PokerFuse article that was written the other day about my transition to Fantasy Sports.

I actually had decided a few months ago that I wanted to keep my decision to transition to DFS kind of low-key.  Oops.  Well that's all out the window at this point.  I really wasn't expecting so much attention and interest.

DFS Links:

DraftKings: https://www.draftkings.com/r/SteveAvery

FanDuel: https://www.fanduel.com/?invitedby=ronniegant&cnl=da

DraftDay: https://www.draftday.com/invite/TomGlavine

If anyone has any DFS questions regarding the sites, feel free to email me at tony.pirone@gmail.com

2+2 Well: http://forumserver.twoplustwo.com/18/high-stakes-limit/retirement-poker-tpirahna-well-1479652/

PokerFuse: http://pokerfuse.com/features/in-depth/25931-online-poker-legend-quits-pursue-fantasy-sports/



Friday, October 3, 2014

Retirement

I’m officially retired from poker.  I've put off this blog post for a while trying to be certain that I’m done and while I’m still not 100% certain, I’m more certain now than at any other point.  That said I will probably play poker in some capacity in the future, even if only for recreational purposes.

A number of factors led me to this decision.  First and foremost, I’m sick of playing.  It was sometime last year that I realized I didn't want to be playing much longer.  I’m 39 years old now and it has become more and more difficult to sit in front of a computer all day with my attention constantly focused with decision after decision being presented for hour after hour.  Five years ago it was fairly easy for me to play six tables of short-handed LHE for many hours, remain focused at all of them, and have a pretty good idea of what was going on at each.  This year it felt like torture trying to playing four tables and I found it very difficult to have that same level of focus for any extended period of time.

More than just getting older, poker became less and less fun and more and more like work to the point where I've dreaded playing many days.  I've likely logged more online hours and possibly more hands of LHE than anyone alive. I started playing over ten years ago and put in a tremendous amount of volume each of those years.  I don’t know how many hours I played but a rough estimate of hands is 6 Million.   In the early days, I was eager to play every day and brought my laptop everywhere (literally everywhere) not of necessity, but out of a burning desire to play.  That desire is long gone.

Another factor that led to this decision is the current environment in online poker.  Making money in high-stakes online games is more about table selection, seat selection, and quickness to the tables than about actual skill. 

The actual skill of playing is still a prerequisite for making money but the vast majority of high stakes regulars are separated in skill by a fraction of a big bet. 

So other non-poker related skills carry much greater weight than they previously did.   I can’t complain about these non-poker related skills as I've benefited from them as much as anyone.  But now that nearly everyone is aware, the landscape has transformed into a race to get to tables and into a contest to see who can get position on the recreational player.  What was once an easy advantage has turned into a major headache when trying to multi-table.  If I was only playing a few hours a day at a few tables, I don’t think it would be much trouble to have all the tables open and hop on one immediately when a recreational player sits.  But when trying to multi-table for endless hours, also trying to keep every table open while vigilantly watching to see if a seat is about to be taken can be maddening. 

Much more than that, it takes away from the fun of the game and the actual game of poker itself. 

Another factor (and there are many more) in the decision to retire was the constant travel after Black Friday.  As cool as it is to experience different cultures and live in exotic locations, it also gets old after a while.  There are a LOT of things in the US that I miss,  most of all playing pool.  I've spent more time playing pool in my life than I have playing poker to give you an idea and if faced with a choice of being able to do either (and being able to make the same amount of money) I’d choose pool in a heartbeat.  And I would of said that at any time during my poker career.

Enter the possible solution to all my problems: Daily Fantasy Sports

-Totally legal in the US which solves the travel issue (along with a host of other issues including the ongoing banking nightmare that stems from being an American online poker player)

-Extremely fun as I love sports and competition.   It’s a new challenge that I wake up every day excited to try to take on.

-Makes good use of a lot of my biggest strengths.  Math/odds, knowledge of sports betting markets and lines, psychology, bankroll management, etc. etc.

-A relatively new industry very similar to poker ten years ago in terms of growth and money making potential.  There are a number of other striking similarities but I won’t get into them here. 

It still remains to be seen whether I can support myself and family playing DFS and hence why I’m not 100% certain I’m done with poker.  I will say I’m very optimistic about my chances and have done exceptionally well in the nine weeks or so I’ve been playing.  One thing that is for certain is that I’ll be doing everything in my power to make it work and not have to come back to poker.  It wouldn’t bother me a bit if I never played another hand of online poker again (fingers crossed).

So this is likely to be my last blog post though I may post some personal stuff here from time to time.  Like if I’m lucky enough to play Efren Reyes in pool again J

I would like to extend an offer to 2+2, hopefully one of the moderators there will read this.  I would be happy to do a “Well” on 2+2 where I answer any and all questions related to poker.  I’d be willing to answer pretty much anything and would take the time to thoughtfully answer as many questions as possible.  I think I could probably do something on par to the Phil Galfond well that was done a while back. I actually have some downtime with DFS over the next three weeks so that would be the best time for me to do it.  I can be reached at tony.pirone@gmail.com

That’s hopefully it.  I hope everyone enjoyed following this blog, maybe learned something from it, and possibly gained some inspiration from it.