Neural Network that Predicts Game State Based on Actions



I am trying to find literature on a network architecture that takes the following as in input:

  • Action (like 'Up', 'Down', etc)
  • Image of current state

and outputs:

  • Image of next state

I already have a lot of training data for the inputs. However, I am trying to find relevant literature/architecture for this problem.

Sandeep Silwal

Posted 2017-12-17T23:00:00.030

Reputation: 121

What kind of images are you looking at? – Brian O'Donnell – 2017-12-18T01:39:49.953



You'll probably want to start out with "Action-Conditional Video Prediction using Deep Networks in Atari Games" (arXiv link: That's from 2015 though, I'm sure there have been lots of other interesting developments since then. This paper may still be a good starting point though, and provide you with the correct terminology to plug into google / google scholar to find more recent papers that build on top of this. Google scholar also provides functionality to automatically find papers that cite this one (interesting recent papers will probably cite this one).

As an additional point, you may want to reconsider your desired output. Given a current state-image and action, it may be easier to train a network to predict only the change in image (i.e., predicting NEW_IMAGE - OLD_IMAGE), rather than predicting the full image. You can then always still manually reconstruct the predicted new image simply by adding that output to the old image again. I'm quite sure I've seen this being done in a more recent paper too, but don't remember exactly the title / authors.

Dennis Soemers

Posted 2017-12-17T23:00:00.030

Reputation: 7 644


I tried something similar before for 2048 game. I used the state of the board as x, and the move as y. I just trained the neural network with this dataset. The architecture is like a couple of layers with relu and the final layer as softmax. The major thing is that we should not feed the wrong moves in the dataset to the NN, or else the NN also tend to learn the bad moves, which in turn makes it less smarter.

I gathered my dataset by running the minimax on 2048 and assigning a reward for each move, and then eliminating the bad ones on it.

the above process also depends on the way you are taking the feature vector, if your feature vector is an image, then it makes sense to use CNN.

DQN is also a good option. but do checkout the above link, it helped me too.

My repo:

my results are not actually that great.

William Scott

Posted 2017-12-17T23:00:00.030

Reputation: 51