What are state-of-the-art ways of using greedy heuristics to initially set the weights of a Deep Q-Network in Reinforcement Learning?


I am interested in the current state-of-the-art ways to use quick, greedy heuristics in order to speed up the learning in a Deep Q-Network in Reinforcement Learning. In classical RL, I initially set the Q-value for a state-action pair (S,a) based on the result of such a greedy heuristic run from state S with action a. Is this still a good idea in the setting of a neural network for the approximation of the Q-function, and if yes, what are the optimal ways of doing it? What are other ways of aiding the DQN with the knowledge from the greedy heuristics?

References to state-of-the-art papers would be highly appreciated.


Posted 2017-05-31T07:35:04.387

Reputation: 243



You can check out Bootstrapped DQN, with a demonstration video. Without reading much of the paper, it seems the authors use a different sampling strategy and an action-guide for specific instances.

Another way to initially set weights for the network is to create a dataset of moves (correct, incorrect, etc, as long as they are relevant) and have the network initially learn the dataset. This also helps debugging, as you can see whether the network can actually learn the policy used in the dataset. After learning the dataset, use the same learned network with DQN and start with a smaller exploration rate (like 0.5 instead of 1.0).


Posted 2017-05-31T07:35:04.387

Reputation: 803