Are Q values estimated from a DQN different from a duelling DQN with the same number of layers and filters?


I am confused about the Q values of a duelling deep Q network (DQN). As far as I know, duelling DQNs have 2 outputs

  1. Advantage: how good it is to be in a particular state $s$

  2. Value: the advantage of choosing a particular action $a$

We can make these two outputs into Q values (reward for choosing particular action $a$ when in state $s$) by adding them together.

However, in a DQN, we get Q values from the single output layer of the network.

Now, suppose that I use the same DQN model with the very same weights in my input and hidden layers and changing the output layer which gives us Q values to advantage and value outputs. Then, during training, if I add them together, will it give me the same Q value for a particular state, supposing all the parameters of both my algorithms are the same except for the output layers?


Posted 2020-04-13T03:46:47.127

Reputation: 31



Dueling-DQN has different network architecture comparing to vanilla DQN, so I don't think your version will work as well as the Dueling architecture.

From Wang et al., 2016, Dueling Network Architectures for Deep Reinforcement Learning

On the other hand, since we only have the target Q-value, separating the Q-value into state value and advantage result in the identifiability issue. That is the network might simply learn $V(s)=0$, $A(s,a)=Q(s,a)$ for every state.

To tackle with this issue, we should force an additional constraint on the advantage estimate. We can simply use the equation below as mentioned in the paper, that is, normalize the advantages across actions before combining with the state value.



Posted 2020-04-13T03:46:47.127

Reputation: 186