So I just read about deep Q-Learning which is using a neural network for optimization instead of Q-table.
I saw the example here: https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html and he used CNN to get the Q-Value.
My confusion is on the last layer of his neural net. Neurons in the output layer each represent an action (flap, or not flap). I also see the other projects where the output layer also represents all available actions (move-left, stop, etc.)
How would you represent of all available action of a Chess game? Every pawn have unique and available movement. We also need to choose how far it will move (rook can move more than one square). I've read Giraffe chess engine's paper and can't find how he represents the output layer (I'll read once again).
I hope somebody here can give a nice explanation about how to design NN architecture in Q-learning, I'm new in reinforcement learning. Thank you.