4

In Section 4.3 of paper Learning by Playing - Solving Sparse Reward Tasks from Scratch, the authors define Retrace as $$ Q^{ret}=\sum_{j=i}^\infty\left(\gamma^{j-i}\prod_{k=i}^jc_k\right)[r(s_j,a_j)+\delta_Q(s_i,s_j)],\\ \delta_Q(s_i,s_j)=\mathbb E_{\pi_{\theta'}(a|s)}[Q^\pi(s_i,\cdot;\phi')]-Q^\pi(s_j,a_j;\phi')\\ c_k=\min\left(1,{\pi_{\theta'}(a_k|s_k)\over b(a_k|s_k)}\right) $$ where I omit $\mathcal T$ for simplicity. I'm quite confused about the definition of $Q^{ret}$, which seems not consistent with Retrace define in Safe and efﬁcient off-policy reinforcement learning:

$$ \mathcal RQ(x,a):=Q(x,a)+\mathbb E_\mu[\sum_{t\ge0}\gamma^t\left(\prod_{s=1}^tc_s\right)(r_t+\gamma\mathbb E_\pi Q(x_{t+1},\cdot)-Q(x_t,a_t)] $$

What should I make of $Q^{ret}$ in the first paper?