## Number of Neuron in Q-Learning of Chess

3

1

So I just read about deep Q-Learning which is using a neural network for optimization instead of Q-table.

I saw the example here: https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html and he used CNN to get the Q-Value.

My confusion is on the last layer of his neural net. Neurons in the output layer each represent an action (flap, or not flap). I also see the other projects where the output layer also represents all available actions (move-left, stop, etc.)

How would you represent of all available action of a Chess game? Every pawn have unique and available movement. We also need to choose how far it will move (rook can move more than one square). I've read Giraffe chess engine's paper and can't find how he represents the output layer (I'll read once again).

I hope somebody here can give a nice explanation about how to design NN architecture in Q-learning, I'm new in reinforcement learning. Thank you.

5

To model chess as a Markov decision problem (MDP) you can refer to the AlphaZero paper (Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm). The exact details can be found starting from the bottom of page 13.

Briefly, an action is described by picking a piece and then picking a move with it. The size of the board is 8 by 8 so there can be 8x8 possibilities for picking a piece. Then we can either pick linear movements (in 8 directions) and then pick the number of steps in that direction (maximum 7 steps) or we can make a knight movement (maximum 8 possibilities). So far that is 8x7 + 8. Furthermore, we also need to consider underpromotions (promoting a pawn into a non-queen piece). In this scenario we can have 3 types of pawn movements (forward, left diagonal or right diagonal capture) and 3 types of promotions (rook, knight, bishop) so that makes it 9. So the total dimension of the action space is 8x8x(8x7+8+9) and this will be the number of neuron outputs you will need to use.

Note that this action space representation covers every possible scenario and for example at the start of the game the action of picking the tile E4 and promoting it to a bishop doesn't make sense (there are no pieces on tile E4 at the beginning of the game). Or if we pick a tile where there is a rook we cannot make a knight movement with it. Therefore you will also need to implement a function that can return you the set of possible actions in a given state and ignore all neural network outputs that is not contained in this set.

Obviously this action representation is not set into stone so if you can come up with something better or more compact you can use that one too. You can also make restrictions to your game for example by not allowing underpromotions.

I just read this question: https://stackoverflow.com/questions/27340967/questions-about-q-learning-using-neural-networks and he uses action as input, and only one value as output (Q(s,a)), I think it's more efficient isn't?

– malioboro – 2018-06-28T10:12:51.523

1@malioboro: It might be more efficient in some cases. However, you do need to run this network forward once per allowed action, whilst for an "all actions" network you run it forward once, then filter to allowed actions. The extra cost for calculating extra unused moves with the larger output may not be as much as running the smaller network many times. Especially with deep networks, a one-shot-then filter could be more efficient than a filter then many-shot. – Neil Slater – 2018-06-28T10:34:16.633

Thank you @NeilSlater and HaiNguyen it is clear to me now :) – malioboro – 2018-06-28T22:00:36.277