notes-cog-ai-games-go

Difference between revision 1 and current revision

No diff available.

Blog post explaining Monte Carlo Tree Search (MCTS) with Upper Confidence bound applied to Trees (UCT). One UCT strategy, UCB1, uses a formula from the Multiarm Bandit problem to prioritize which nodes to explore. The Multiarm Bandit problem is that you are faced with a row of slot machines, with different payoff probabilities (and reason to believe that some of them may offer positive expected value payoffs, otherwise why would you play at all?). It costs money to play each machine, so you don't want to spend a lot of money getting more data about machines that already appear to have low payoff. But otoh you don't want to completely ignore those machines, because it's possible that by chance the first zillion times you played the best machine, you happened to lose. This is somewhat related to balancing exploration (try the different machines to get data about their payoff probability) with exploitation (play the estimated best machine as much as possible), except that here we're also prioritizing the exploration itself. The UCB1 strategy says that you pretend the estimated value of each machine is x_i +- sqrt(2*log(n)/n_i), where x_i is the observed mean payout for machine i, n_i is the total number of samples from machine i, and n is the total number of samples for all machines. Then you use a heuristic called optimism-in-the-face-of-uncertainty, which means that you choose to play (explore) the machine wih the highest estimated value (eg you select the upper end of the 'confidence interval', that is, choose + in the +- in the formula). After you play what you think is the best machine very many times, "n" will go up for all machines, increasing their estimated value, and n_i will remain unchanged for the other machines, but will go up for the machine you have been playing; so eventually this rule will tell you to give the other machines a few more tries. hackernews discussion: https://news.ycombinator.com/item?id=10209677 ---- awesome paper reviewing the state of the art in Go game ai: [http://www.cs.utexas.edu/~pstone/Courses/394Rspring13/resources/mcrave.pdf Monte-Carlo Tree Search and Rapid Action Value Estimation in Computer Go by Sylvain Gelly and David Silver


places for bots to play Go: