If you're interested in the theory behind Double Q-learning (*not deep!*), the reference paper would be Double Q-learning by Hado van Hasselt (2010).

As for Double *deep* Q-learning (also called DDQN, short for Double Deep Q-networks), the reference paper would be Deep Reinforcement Learning with Double Q-learning by Van Hasselt et al. (2016), as pointed out in
ddaedalus's answer.

As for how the loss is calculated, it is not explicitly written in the paper. But, you can find it in the Dueling DQN paper, which is a subsequent paper where Van Hasselt is a coauthor. In the appendix, the authors provide the pseudocode for Double DQN. The relevant part for you would be:

$y_{j}=\left\{\begin{array}{ll}r & \text { if } s^{\prime} \text { is terminal } \\ r+\gamma Q\left(s^{\prime}, a^{\max }\left(s^{\prime} ; \theta\right) ; \theta^{-}\right), & \text {otherwise}\end{array}\right.$

Do a gradient descent step with loss $ \left\|y_{j}-Q(s, a ; \theta)\right\|^{2}$

Here, $y_j$ is the target, $\theta$ are the parameters of the regular network and $\theta^{-}$ are the target network parameters.

The most important thing to note here is the difference with the DQN target:
$y_{i}^{D Q N}=r+\gamma \max _{a^{\prime}} Q\left(s^{\prime}, a^{\prime} ; \theta^{-}\right)$.

In DQN, we evaluate the Q-values based on parameters $\theta^{-}$ and we take the max over actions based on these Q-values parametrized with the **same** $\theta^{-}$. The problem with this is that it leads to an overestimation bias, especially at the beginning of the training process, where the Q-values estimates are noisy.

In order to address this issue, in double DQN, we instead take the max based on Q-values calculated using $\theta$ and we evaluate the Q-value of $a^{max}$ based on a different set of parameters i.e. $\theta^{-}$.

If you want to learn more about this, by watching a video lecture instead of reading a paper, I'd suggest you take a look at this lecture from UC Berkley's DRL course, where the professor (Sergey Levine) discusses this in detail with examples.