Poker and the limits of AI


EXCERPTS: John von Neumann [...] viewed poker as the perfect model for human decision making, for finding the balance between skill and chance that accompanies our every choice. He saw poker as the ultimate strategic challenge, combining as it does not just the mathematical elements of a game like chess but the uniquely human, psychological angles that are more difficult to model precisely...

[...] three computer programs designed to test their mettle against human poker players: Claudico, Libratus, and most recently, Pluribus. ... The goal isn’t to solve poker, as such, but to create algorithms whose decision making prowess in poker’s world of imperfect information and stochastic situations — situations that are randomly determined and unable to be predicted — can then be applied to other stochastic realms, like the military, business, government, cybersecurity, even health care.

[...] the first program, Claudico, was summarily beaten by human poker players —“one broke-ass robot,” an observer called it — Libratus has triumphed in a series of one-on-one, or heads-up, matches against some of the best online players in the United States.

Libratus relies on three main modules. The first involves a basic blueprint strategy for the whole game, allowing it to reach a much faster equilibrium than its predecessor. It includes an algorithm called the Monte Carlo Counterfactual Regret Minimization, which evaluates all future actions to figure out which one would cause the least amount of regret. Regret, of course, is a human emotion. Regret for a computer simply means realizing that an action that wasn’t chosen would have yielded a better outcome than one that was. “Intuitively, regret represents how much the AI regrets having not chosen that action in the past,” says Sandholm. The higher the regret, the higher the chance of choosing that action next time.

[...] The second module is a sub-game solver that takes into account the mistakes the opponent has made so far and accounts for every hand she could possibly have. And finally, there is a self-improver. This is the area where data and machine learning come into play. It’s dangerous to try to exploit your opponent — it opens you up to the risk that you’ll get exploited right back, especially if you’re a computer program and your opponent is human. So instead of attempting to do that, the self-improver lets the opponent’s actions inform the areas where the program should focus. “That lets the opponent’s actions tell us where [they] think they’ve found holes in our strategy,” Sandholm explained. This allows the algorithm to develop a blueprint strategy to patch those holes.

[...] There’s one final thing Libratus is able to do: play in situations with unknown probabilities. There’s a concept in game theory known as the trembling hand: There are branches of the game tree that, under an optimal strategy, one should theoretically never get to; but with some probability, your all-too-human opponent’s hand trembles, they take a wrong action, and you’re suddenly in a totally unmapped part of the game. Before, that would spell disaster for the computer: An unmapped part of the tree means the program no longer knows how to respond. Now, there’s a contingency plan.

Of course, no algorithm is perfect. When Libratus is playing poker, it’s essentially working in a zero-sum environment. It wins, the opponent loses. The opponent wins, it loses. But while some real-life interactions really are zero-sum — cyber warfare comes to mind — many others are not nearly as straightforward: My win does not necessarily mean your loss. The pie is not fixed, and our interactions may be more positive-sum than not.

What’s more, real-life applications have to contend with something that a poker algorithm does not: the weights that are assigned to different elements of a decision. In poker, this is a simple value-maximizing process. But what is value in the human realm? [...] “The world will ultimately become a lot safer with the help of algorithms like Libratus,” Tuomas Sandholm, a computer scientist at Carnegie Mellon University, told me. I wasn’t sure what he meant. The last thing that most people would do is call poker, with its competition, its winners and losers, its quest to gain the maximum edge over your opponent, a haven of safety.

“Logic is good, and the AI is much better at strategic reasoning than humans can ever be,” he explained. “It’s taking out irrationality, emotionality. And it’s fairer. If you have an AI on your side, it can lift non-experts to the level of experts. Naïve negotiators will suddenly have a better weapon. We can start to close off the digital divide.”

It was an optimistic note to end on — a zero-sum, competitive game yielding a more ultimately fair and rational world. I wanted to learn more, to see if it was really possible that mathematics and algorithms could ultimately be the future of more human, more psychological interactions. And so, later that day, I accompanied Nick Nystrom, the chief scientist of the Pittsburgh Supercomputing Center — the place that runs all of Sandholm’s poker-AI programs — to the actual processing center that make undertakings like Libratus possible... (MORE - details)

Possibly Related Threads…
Thread Author Replies Views Last Post
  Poker program Cepheus is unbeatable, claim scientists C C 0 546 Jan 15, 2015 11:39 PM
Last Post: C C

Users browsing this thread: 1 Guest(s)