## TD-Leaf struggles at learning chess

0

1

I am currently working on implementing Giraffe chess algorithm. Following this paper, I designed a neural network similar to the one proposed by the author which I trained using TD-Leaf(lambda). The procedure that I followed is described in the extract of the paper shown below. The corresponding code that I wrote (using PyTorch) is the following (you can find it here).

def self_play(batch, net, device, n_moves):
'''Self play on n_moves of a given game'''
boards = [chess.Board(b) for b in batch['board']]
boards = list(map(push_random_move, boards))
giraffe_move = partial(find_best_move, max_depth=0,
evaluator=partial(giraffe_evaluation, net=net, device=device))
scores = []
for _ in range(n_moves):
moves_, scores_ = zip(*map(giraffe_move, boards))
scores.append(torch.stack(scores_))
boards = [push_move(board, move) for (board, move) in zip(boards, moves_)]
#scores = list(zip(*scores))
scores = torch.stack(scores)

def td_loss(scores, td_lamdba):
L, N = scores.size()
err = torch.zeros((L, N))
for t in range(N-1):
discount = 1
err_t = torch.zeros(L)
for j in range(t, N-1):
dj = scores[:, j+1] - scores[:, j]
discount *= td_lamdba
err_t += discount * dj
err[:, t] = err_t
# we include a minus sign because torch computes a gradient descent
# by default, but we want to impose a custom update rule for the weights
loss = torch.mean(torch.sum(-scores * err.detach(), dim=1))
return loss

def self_learn(batch, net, device, n_moves, optimizer):
scores = self_play(batch, net, device, n_moves)
loss = td_loss(scores, 0.7)