1

Why is it hard to prove the convergence of the DQN algorithm? We know that the tabular Q-learning algorithm converges to the optimal Q-values, and with a linear approximator convergence is proved.

The main difference of DQN compared to Q-Learning with linear approximator is using DNN, the experience replay memory, and the target network. Which of these components causes the issue and why?

1

See Why doesn't Q-learning converge when using function approximation?.

– nbro – 2020-05-10T16:06:14.183Thanks for the link. I carefully read the post, it does not actually answer my question. – Afshin Oroojlooy – 2020-05-10T23:25:53.340