I am training an RL agent using Deep-Q learning with experience replay. At each frame, I am currently sampling 32 random transitions from a queue which stores a maximum of 20000 and training as described in the Atari with Deep RL paper. All is working fine, but I was wondering whether there is any logical way to select the proper batch size for training, or if simply using a grid search is best. At the moment, I’m simply using 32, for its small enough that I can render the gameplay throughout training at a stunning rate of 0.5fps. However, I’m wondering how much of an effect batch size has, and if there is any criteria we could generalize across all Deep Q-learning tasks.