What is the next state for a two-player board game?



I'm using Q-learning to train an agent to play a board game (e.g. chess, draughts or go).

The agent takes an action while in state $S$, but then what is the next state (that is, $S'$)? Is $S'$ now the board with the piece moved as a result of taking the action, or is $S'$ the state the agent encounters after the other player has performed his action (i.e. it's this agent's turn again)?


Posted 2019-02-09T23:56:24.377

Reputation: 395



If your opponent has fixed knowledge (it doesn't learn), then the next state after your agent did an action is the state when your turn is back. So the actions of other players are considered as an environment reaction to your actions.

But if your opponent can learn, you may create a Multi-agent Reinforcement Learning


Posted 2019-02-09T23:56:24.377

Reputation: 2 112

Thank you. But can I ask what difference it makes if the opponent can or cannot learn? I'm not sure why whether they learn or not makes any difference to this agent's perspective? The reason I ask is for the Q-learning update rule where you use $maxaQ(S′,a)$I just had to be sure $S'$ is the State the agent encounters when it's their next turn, not immediately after performing its action. Thank you for the link to the paper, I will read with interest. – BigBadMe – 2019-02-10T09:26:25.487

In multi-agent RL, your opponent is not like opponent in the real game, your agent can learn from your opponent's experience. So in the match, your agent is trained on both side (it's used to improve the training process). But in normal RL games, $S'$ is the states that your agent received (on its turn). – malioboro – 2019-02-10T09:54:52.010

4@BigBadMe I would not say that the distinguishing factor is whether or not the opponent can learn. I'd say the distinguishing factor is whether or not your learning algorithm is aware of the opponent's ability to learn. If you're trying to apply a single-agent learning algorithm like vanilla $Q$-learning, that algorithm is not even aware of the opponent's existence, so you have to treat the opponent's actions as a part of the environment. Some dedicated multi-agent RL algorithms may be "opponent aware". – Dennis Soemers – 2019-02-10T10:21:11.367