Minimax combined with machine learning to determine if a path should be explored



I have an idea for a new type of AI for two-player games with alternating turns, like chess, checkers, connect four, and so on.

A little background: Traditionally engines for such games have used the minimax algorithm combined with a heuristic function when a certain depth has been reached to find the best moves. In recent days engines using reinforcement learning, etc (like AlphaZero in chess) have increased popularity, and become as strong as or stronger than the traditional minimax engines.

My approach is to combine these ideas, to some level. A minimax tree with alpha-beta pruning will be used, but instead of considering every move in a position, these moves will be evaluated with a neural net or some other machine learning method, and the moves which seem least promising will not be considered further. The more interesting moves are expanded like in traditional minimax algorithms, and the same evaluation are again done for these nodes' children.

The pros and cons are pretty obvious: By decreasing the breadth (number of moves in a position), the computation time will be reduced, which again can increase the search depth. The downside is that good moves may not be considered, if the machine learning method used to evaluate moves are not good enough.

One could of course hope that the position evaluation itself (from the neural net, etc) is good enough to pick the best move, so that no minimax is needed. However, combining the two approaches will hopefully make better results.

A big motivation for this approach is that it resembles how humans act when playing games like chess. One tends to use intuition (which will be what the neural net represents in this approach) to find moves which looks interesting. Then one will look more thoroughly at these interesting moves by calculating moves ahead. However, one does not do this for all moves, only those which seem interesting. The idea is that a computer engine can play well by using the same approach, but can of course calculate much faster than a human.

To illustrate the performance gain: The size of a minimax tree is about b^d, where b is the average number of moves possible in each position, and d is the search depth. If the neural net can reduce the size of considered moves b to half, the new complexity will be (b/2)^d. If d is 20, that means reducing the computation time by approx. 1 million.

My questions are:

  1. Does anyone see any obvious flaws about this idea, which I might have missed?

  2. Has it been attempted before? I have looked a bit around for information about this, but haven't found anything. Please give me some references if you know any articles about this.

  3. Do you think the performance of such a system could compete with those of pure minimax or those using deep reinforcement learning?

Exactly how the neural net will be trained, I have not determined yet, but there should be several options possible.

Mr. Eivind

Posted 2019-04-26T12:36:13.383

Reputation: 575

This sounds similar to TD-Gammon I believe, which was written to play backgammon. You sound like you are describing a neural network that will bootstrap the values of state-action pairs and then only evaluate some top percentile of state action pairs. I think TD-Gammon 2.1 and forward really took this depth search further as well. But, as you might already know, TD-Gammon is an RL implementation with basic TD($\lambda$} and an ANN used to estimate state action values. – Hanzy – 2019-04-28T20:16:06.807



The use of of a neural network to push the search algorithm to continually only along a promising path is the same that was described in the AlphaZero paper. In AlphaZero, the NN loop contained the search function and would encourage the continued search of high probability moves that were then simulated by the same NN that now contained the Value Net. The use of alpha-beta specifically is not necessary. Just a search function aptly known as PUCT (Predictor + Upper Confidence Bounds applied to Trees)

Joe Markso

Posted 2019-04-26T12:36:13.383

Reputation: 1