Alpha zero before move 8


The Alpha zero paper says that the The first set of features are repeated for each position in a T = 8-step history. So what happens before the first 8 moves? Do they just repeat the starting position?

Bojidar Ivanov

Posted 2018-11-23T14:24:55.930

Reputation: 112



On page 13, right under Table S1 in the linked paper, this is explained (emphasis in bold at the end mine):

Each set of planes represents the board position at a time-step $t - T + 1, \dots, t$, and is set to zero for time-steps less than $1$.

I suspect the solution they write there would indeed work better than just repeating the starting position up to 8 times. Intuitively, you'll want the Neural Network to learn to primarily focus on the current game state. If the starting position is repeated a bunch of times in those planes, the Neural Network cannot distinguish between any of them in the learning process in the first few steps, and may start relying on them all equally. Only in later time steps will it "figure out" that they're sometimes not equal, and that the last one is probably the most informative one. If the "useless planes" during the first few steps are all-zero, they can be much more easily ignored in the start of the learning process.

Note that I suspect the difference really won't matter a whole lot at all, I suspect there'd just be a tiny difference in learning speed based on the intuition described above.

Dennis Soemers

Posted 2018-11-23T14:24:55.930

Reputation: 7 644