Most any decision making with a specific sets of objectives and guidelines can be framed as a game. In mid 2017, it would have It is difficult to suggest one AI approach to a winning chess AI that had value outside of chess game play. Historically, game strategy AI approaches varied greatly from game to game and research lab to research lab.
The approach to tic-tac-toe is the trivial case in AI, since it is easy to calculate the probability of winning for every possible sequence of moves on a single CPU in less than a second. At the other extreme of game complexity and dynamics, only the technical talent contractually obligated to secrecy in contracts with the portfolio managers of billionaires know how to play the game of high speed trading, optimized for maximum gain in position per hour.
The international champion in inter-computer chess play is Stockfish. It produced the highest consistency in wins in international tournaments thus far. Stockfish performs its best move search in a highly parallel CPU run time environment. A sophisticated pruning and depth prioritization involving late move reductions is applied to the alpha-beta search algorithm, in conjunction with a bit board.
In human-computer play Zor was the international champion in 2017.
Change in AI Approach
In 2017, DeepMind’s AlphaZero defeated Stockfish 28–72-0. (The number of draws is positioned in the center number.) That it won the tournament is not the remarkable advancement from an AI perspective. The same algorithm, configured differently also plays a winning game at Go and Shogi. The approach and design is described in Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, David Silver et. al., 2017
AlphaZero is configured with the rules of the game to constrain the search. It begins training prior to tournament game play without prior knowledge of game play. That no triggered game strategies or heuristic rules to guide the search are known before learning game play, the authors claim the learning is tabula rasa, Latin meaning blank slate. Reinforcement learning is used to develop a strategy during self-play.
AlphaZero, "Averages over the position evaluations within a subtree, rather than computing the minimax evaluation of that subtree," as is commonly used in chess players based on the alpha-beta approach. It determines the relative value of states (board positions) based on a DNN trained to produce values associated with all of the state information from outcomes. This is distinct from valuation by summing points assigned to pieces and their locations. AlphaZero uses a Monte-Carlo tree search (MCTS) algorithm, using ordered deepening to minimize computing resource utilization per move.
The MCTS mitigates the numerical error, DNN convergence artifacts that can accumulate at the root of each sub-tree, via aggregation. The aggregation of data that contains reasonably normalized noise causes cancellation of the symmetrically distributed deviations.
The achievement is significant. The abstract of the paper states, "Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case."
Remaining Domain Dependence
Without dismissing the achievement, the claim of no domain knowledge except the game rules conflicts with the additional five items listed in the Domain Knowledge section of the same paper, reproduced here solely for convenience of the reader.
- The input features describing the position, and the output features describing the move, are structured as a set of planes; i.e. the neural network architecture is matched to the grid-structure of the board.
- AlphaZero is provided with perfect knowledge of the game rules. These are used during MCTS, to simulate the positions resulting from a sequence of moves, to determine game termination, and to score any simulations that reach a terminal state.
- Knowledge of the rules is also used to encode the input planes (i.e. castling, repetition, no-progress) and output planes (how pieces move, promotions, and piece drops in shogi).
- The typical number of legal moves is used to scale the exploration noise (see below).
- Chess and shogi games exceeding a maximum number of steps (determined by typical game length) were terminated and assigned a drawn outcome; Go games were terminated and scored with Tromp-Taylor rules, similarly to previous work.
In addition to game rules and the above five listed domain dependencies, there are further dependencies.The higher level analysis of the consequences of the board geometry and game rules is not a fundamental game rule.
In Go game play, "During MCTS, board positions were transformed using a randomly selected rotation or reflection before being evaluated by the neural network, so that the MonteCarlo evaluation is averaged over different biases."
Rules are analyzed to scale, "Noise that is added to the prior policy to ensure exploration ... in proportion to the typical number of legal moves for that game type."
This totals eight dependencies, and a grossly insufficient number of game types were tried to make a claim that AlphaZero is a, "domain-independent search." But bending the the concept of domain independence in the claims is more easily forgiven when legitimate and significant achievements are made by an enthusiastic team. The approach is sound and the use of the craft by the DeemMind team is world class.
Beyond Removing Remaining Dependencies
Even after the eight dependencies are whittled down by the various teams involved in game play automation, there is further work that may become the focus of further research.
Board games are a particular kind of game. In games where the game rules can mutate, such as markets, law, war, and other domains where the boundaries of the domain require knowledge of broader sets of domains is more challenging, although it is reasonable to expect that dependencies will be reduced and current approaches that work with board games may become adaptive in terms of game rule acquisition.
Currently, approaches like AlphaZero require that input preparation be designed and executed and that outputs be executed on a virtual board. They do not yet discover game states by vision, execute moves with robotics, or acquire the rules of the game from natural language descriptions or the analysis of past games.
These limitations do not invalidate the significant advancement of a game player that needs only rules, a small set of domain specific configurations, and some self play time to defeat champion level dedicated artificial game players.