4

1

**Problem**

My problem is the following: Given 1000 wins, losses, and ties from a chess simulation I am using, what shape should each game be (I.e., sequence of moves leading to win/loss/tie) in order to build an deep neural network on it?

**Literature Review**

**Current Situation**

I currently have a sample of 1000 wins, 1000 losses, and 1000 ties from the `python-chess`

api, so if `i`

is the index of a game, then this is the structure of the current dataset I am working with:

```
game_i -> (num_moves_i,8,8,16)
```

So each `game_i`

where `i in {1..3000}`

and `num_moves_i is variable`

depending on the game (E.g., 14 for a good winning game, or 765 moves for a tie game). The 16 represents a `one_hot_encoding`

for one of the 16 unique board pieces. The data set is also an alternating board state, so:

```
game_i[0] == board state of whites first move
game_i[1] == baord state of blacks first move
```

Furthermore, I also have alpha-beta pruning and maximin working, so for each move I have an intrinsic value associated with it using recursion three levels deep. Leading my current approach to a regression of a given move, essentially leading me to believe the AI would simply learn the heuristic and predict a value the heuristic would give.

**Summary**

Clearly my proof of concept 1000 winning games is not enough to make a meaningful AI, but that isn't my goal. I want to learn the techniques, not produce an enterprise scale chess AI.

- Does the tensor shape make sense?
- Is this a reinforcement learning problem? If so, how can I shape my current framing into that type of thinking? Theory in this area would be greatly appreciated as I am less familiar with it.
- Is this a RNN/LSTM problem? (E.g., predict the next board state).
- Is this a regression problem?
- Is this a sequence mining problem?

What is the standard approach to framing this problem, once you have data falling through the pipeline.

Your support is more than appreciated.

*** UPDATE (STILL IN RESEARCH) ***

With further research, a candidate label for the training data is the tensor and the item from the state-space that was selected with that board state. Then only keep games containing sequences of moves which obtain a cumulative value >= `epsilon`

. This would require a `one_hot_encoding`

of all moves played in all games we wish to train on, as labels. E.g., `(game_i,board_ij,e2e6)`

This makes sense. If I am tracking your logic, you essentially only feed it winning games to get a probability of a move being a winning game from the model we build on it. Follow on: so does transforming the board in a winning game (8,8,16) into a feature vector imply building a

`board_to_id`

lookup table for each of the boards? I am also concerned with how to handle the alternating fashion of this problem, as I will be feeding a dataset with alternating moves, which seem at a glance only good for learning context not predicting quality of my move. – bmc – 2018-11-10T21:52:37.430