StarCraft II is a real time strategy game that combines fast paced micro actions with the need for high level planning and execution. StarCraft II being a popular game with millions of users it proceeds that defeating top players becomes a meaningful and measurable long term objective in AI research.
Computer games provide a compelling solution to the issue of evaluating and comparing different learning and planning approaches on standardized tasks. They are an important source of challenges for research in AI.
Game playing AI agents i.e. deepmind's Atarinet and DQN alongside Open AI's Dota 2 bot represent the first demonstration of a General Purpose Agent that is able to continually adapt behavior without any human intervention, a major technical step forward in the quest for general AI (Source deepmind blog).
Computer games offer numerous advantages in AI research i.e:
- They have clear objective measures of success.
- Computer games typically output rich streams of observational data, which are ideal inputs for deep networks.
- They are externally defined to be difficult and interesting for a human to play. Therefore they provide an excellent test for intelligence.
- Games are designed to be able to run anywhere with the same interface and game dynamics. This enables running many simulations in parallel. Sharing and updating the same table throughout training.
- In some cases pools of superb human players exist, making it possible to benchmark against highly skilled humans.
The Starcraft challenge for reinforcement learning, introduces a taxing set of problems because it is a multi-agent problem with multiple players interacting. There is imperfect information due to a partially observed map, it has a large state space, it has delayed credit assignment requiring long term strategies.
The SC2LE Environment
DeepMind and Blizard games have collaborated to release the SC2LE, which exposes StarCraft II as a research environment.
The SC2LE consists of three sub-components.
A Linux Starcraft II binary.
StarCraft II API which allows programmatic control of StarCraft II. The API can be used to start the game, get observations, take actions and review replays.
PySC25 which is an open source environment written in Python. It includes some mini-games and visualization tools
Open source Open AI RL environments
Universe - Universe is a software platform by Open AI for measuring and training an AI's general intelligence across games, websites and other applications.
Gym - Open AI Gym is a toolkit for developing and comparing reinforcement learning algorithms. It makes no assumptions about the structure of your agent and is compatible with any numerical computation library such as Tensorflow or Theano.
Supervised Classification Approach
Consider this, we could decide to screen capture game sessions from expert players and use it as input to a model. The output could be the direction in which the AI agent could move. This would be a supervised classification approach.
However, this is not an elegant solution because we are training a model not on a static dataset but a dynamic one (game environment). The training data from a game environment is stochastic/continuous meaning any number of events can occur.
Furthermore, humans learn most effectively by interacting with the environment. Not by watching others interact with the environment.
Markov Decision Process
Markov Decision Processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker i.e. game environments.
Reinforcement Learning with Deep Q-Learning
Q-Learning is a strategy that has been proven to find an optimal action selection policy for any Markov Decision Process (MDP). In Q-Learning we choose an action that maximizes future reward. The further in the future we go, the further the rewards can diverge, we resolve this by adding a discount in future rewards.
Unlike policy gradient methods, which attempt to learn functions which directly map an observation to an action, Q-Learning attempts to learn the value of being in a given state, and taking a specific action there. (Arthur J 2016)
The formula for Q-Learning is:
R = Reward
s = State
a = Action
Experience during learning is based on (s, a) pairs
One has an array Q and uses experience to update it directly
(Source wikipedia https://en.wikipedia.org/wiki/Markov_decision_process)
One of the strengths of Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment.
For further reference, I recommend you look at Siraj Ravals tutorial on Deep Q-Learning
https://www.youtube.com/watch?v=79pmNdyxEGo and source code for the same available here https://github.com/llSourcell/deep_q_learning
Additionally I recommend the following references for more information on computer game playing AI agents.
StarCraft II: A New Challenge for Reinforcement Learning https://arxiv.org/abs/1708.04782
Playing Atari with Deep Reinforcement Learning https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
Human-level control through deep reinforcement learning https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf