4

I am interested in the current state-of-the-art ways to use quick, greedy heuristics in order to speed up the learning in a Deep Q-Network in Reinforcement Learning. In classical RL, I initially set the Q-value for a state-action pair (S,a) based on the result of such a greedy heuristic run from state S with action a. Is this still a good idea in the setting of a neural network for the approximation of the Q-function, and if yes, what are the optimal ways of doing it? What are other ways of aiding the DQN with the knowledge from the greedy heuristics?

References to state-of-the-art papers would be highly appreciated.