I assume OP understand basics of DQN - Time Difference (TD), target network and replay buffer. My experience is just my observations, it not necessarily generalize.
Some of those variants:
Double DQN: small difference in Time Difference propagation. Instead of propagating for maximum action of target network, Q taken form target network propagated for maximum action for current network. This claim to reduce the extrapolation error of TD, which prone to overestimation. My experience: not really helping.
Dueling DQN: difference in network architecture - Q separated into Advantage (Q(a)-Value) and Value (max_a Q(a)) which calculated separately and summed. My experience: not really helping.
Prioritized DQN: instead of uniform sampling of replay buffer it is sampled with probability depended on the error of TD. My experience: really helping, must have for any implementation.
Episodic memory DQN: this is really ambitious method. It combine memory-augmented network (differentiable dictionary, idea similar to Neural Turing Machine) with DQN. Difficult to implement correctly. My experience is ambiguous - I tried it in much simplified form, it was not quite clear to me if it was helping. I intend to return to it later. It also incur considerable performance cost.
C51 architecture - this could be important advance. It's attempt to get rid of regression in DQN(regressions are hard for NN) and replace it with classification. Could be costly from performance point of view. Some reports claim it no worse then best performing Deep RL methods (with exception of tree-based method like Alpha Zero).