Where does reinforcement learning actually show up in Deepmind's game engines?


From the brief research I've done on the topic, it appears that the way Deepmind's Alphazero or Muzero makes decisions is through Monte Carlo tree searches, where in the randomized simulations allows for a more rapid way to make calculations than traditional alpha-beta pruning. As the simulation space increases, this search approaches that of a classical tree search.

Where exactly did Deepmind use neural networks? Was it in the evaluation portion? And if so, how did they make determinations on what makes a "good" or "bad" game state? If they deferred the evaluations of another chess engine like Stockfish, how do we see AlphaZero absolutely demolish Stockfish in head-to-head matches?

Amar Srivastava

Posted 2020-05-17T17:44:55.300

Reputation: 31

No answers