Combining deep reinforcement learning with alpha-beta pruning

1

1

I will explain my question in relation to chess, but it should be relevant for other games as well:

In short terms: Is it possible to combine the techniques used by AlphaZero with those used by, say, Stockfish? And if so, has it been attempted?

I have only a brief knowledge about how AlphaZero works, but from what I've understood, it basically takes the board state as input to a neural net, possibly combined with monte carlo methods, and outputs a board evaluation or prefered move. To me, this really resembles the heuristic function used by traditional chess engines like stockfish.

So, from this I will conclude (correct me if I'm wrong) that AlphaZero evaluates the current position, but uses a very powerful heuristic. Stockfish on the other hand searches through lots of positions from the current one first, and then uses a less powerful heuristic when a certain depth is reached.

Is it therefore possible to combine these approaches by first using alpha-beta pruning, and then using AlphaZero as some kind of heuristic when the max depth is reached? To me it seems like this would be better than just evaluating the current position like (I think) AlphaZero does. Will it take too much time to evaluate? Or is it something I have misunderstood? If it's possible, has anyone attempted it?

Mr. Eivind

Posted 2019-04-03T09:55:20.223

Reputation: 575

Answers

1

Yes it's possible to to combine AlphaZero with Minimax methods (including alpha-beta pruning). AlphaZero itself is combination of Monte Carlo Tree Search (MCTS) and Deep Network, where MCTS is used to get data to train network and network used for tree leafs evaluation (instead of rollout as in classical MCTS). It's possible to combine selection-expansion part of AlphaZero MCTS with Minimax the same way as it was done for classical MCTS - "Monte-Carlo Tree Search and Minimax Hybrids", pdf.

mirror2image

Posted 2019-04-03T09:55:20.223

Reputation: 547

0

So, from this I will conclude (correct me if I'm wrong) that AlphaZero evaluates the current position, but uses a very powerful heuristic. Stockfish on the other hand searches through lots of positions from the current one first, and then uses a less powerful heuristic when a certain depth is reached.

This is wrong. Like Stockfish, AlphaZero as well "searches through lots of positions from the current one." You hint at this yourself when you say "possibly combined with monte carlo methods", but it seems you don't understand exactly what that means, so let me explain:

Stockfish searches through the tree of future moves using an algorithm called Minimax (actually a variant called alpha beta pruning), whereas AlphaZero searches through future moves using a different algorithm called Monte Carlo Tree Search (MCTS). Minimax is well suited to quick evaluation functions, whereas MCTS explores fewer moves and thus can handle a more expensive evaluation function. Further, MCTS works not with exact values but with probabilities, and AlphaZero uses the Neural Net not just for the value of moves but to guide which moves to explore next (it functionally is actually 2 networks, a policy network and a value network). To be sure, that is somewhat of a simplification. It is not impossible to use a NeuralNet in conjunction with Minimax. Its just that in practice, due to the nature of the algorithm, its simply too expensive.

chessprogrammer

Posted 2019-04-03T09:55:20.223

Reputation: 392

AlphaZero is not using NN in tree policy but only for evaluation of leafs: https://github.com/suragnair/alpha-zero-general/blob/master/MCTS.py

– mirror2image – 2019-07-25T06:24:04.607

@mirror2image that link is to some personal implementation. It may be I confused it will alpha go I'll check the paper – chessprogrammer – 2019-07-25T13:21:30.127

0

AlphaGo uses MCTS. AlphaZero does not.

Source: Mastering the Game of Go without Human Knowledge

Eddie Rowe

Posted 2019-04-03T09:55:20.223

Reputation: 1

Welcome to SE:AI! This answer would be stronger if you quoted the relevant passages from the paper. – DukeZhou – 2020-05-22T22:56:10.160