How powerful is OpenAI's Gym and Universe in board games area?


I'm a big fan of computer board games and would like to make Python chess/go/shogi/mancala programs. Having heard of reinforcement learning, I decided to look at OpenAI Gym.

But first of all, I would like to know, is it possible using OpenAI Gym/Universe to create a chess bot that will be nearly as strong as Stockfish and create a go bot that will play as good as AlphaGo?

Is it worth learning OpenAI?


Posted 2020-01-26T10:20:55.023

Reputation: 53



OpenAI's Gym is a standardised API, useful for reinforcement learning, applied to a range of interesting environments many of which you can then access for free with little effort. It is very simple to use, and IMO worth learning if you want to practice RL using Python to any depth at all. You could use it to ensure you have good understanding of basic algorithms such as Q learning, independently of and before you look at using RL in a board game context.

There are limitations for Gym and Universe when dealing with multiple agents. The API is not really designed with that in mind. For instance, there is no simple way to add two agents to an environment, you would have to write a new environment and attach an opposing agent inside of it. This is still possible, and not necessarily a terrible idea (it depends on the training setup you want to investigate).

If you want to look into classic two-player games, and write bots like AlphaGo and Stockfish, then I would point out that:

  • Game-playing bots often make extensive use of planning that can interrogate potential future game states. OpenAI's Gym doesn't prevent you doing that, but it doesn't help in any way.

  • Algorithms for AlphaGo are public, with many nice tutorials. It would be quicker to follow one of these and develop your own bot training code in most cases, than to try and adapt an OpenAI solution for single agent play.

  • Probably the biggest time-saver you could find for any game is a rules engine that implements the board, pieces and game rules for you. If Gym already has a game environment for the game you want your bot to play, it might be worth checking the Gym code to see what it is integrating, then try to use the same library yourself, but not the Gym environment directly.

  • Many decent game-playing algorithms don't use RL at all. You can frame most of them as search (finding best moves) plus heuristics (rating moves or positions), and can usually make independent choices for algorithms that perform each sub-task. You can apply RL so that a bot learns game heuristics, then use a more traditional search e.g. negamax in order to make decisions during play. Or you can use any analysis of the game you like in order to generate heuristics. Very simple games usch as tic-tac-toe (noughts and crosses in UK) can just have heuristic of +1 if X has won, -1 if O has won and 0 otherwise, and still be quickly solved with a minimax search for perfect play.

  • DeepMind's AlphaGo uses a variant of MCTS for the search algorithm, which can be considered a RL technique, but the lines are a bit blurry around definitions there - it is safer to say that AlphaGo incorporates MCTS as the chosen search technique for both self-play training and active play against any other opponent.

Neil Slater

Posted 2020-01-26T10:20:55.023

Reputation: 14 632

Thank you Neil. I would like to know, does the success of a strong bot creation depend mostly on hardware? Is a home laptop enough for this task? – Taissa – 2020-01-26T16:05:13.903

1@Taissa: I think that would be a completely different question. Feel free to ask on the site. However, in short if you intend to train your bot through self-learning approach similar to DeepMind's approach, then yes you will need significant hardware budget in order to compete with them. Cutting edge reinforcement learning as practiced today uses a lot of CPU/GPU processing – Neil Slater – 2020-01-26T16:09:04.300