What does the notation "for t=T to 1,−1 do" in terms of time steps, in deep recurrent q network?


In looking at an algorithm in the paper Learning to Communicate with Deep Multi-Agent Reinforcement Learning.

Here is the full algorithm:

enter image description here

What does the notation for t=T to 1,−1 do: refer to in terms of time steps?

The network structure is a deep recurrent q network.

Secondly, why do the gradients need to be reset to zero?


Posted 2020-07-03T12:17:31.510

Reputation: 21

2That notation should mean to go from time step $T$ to time step $1$ by a negative step $-1$, i.e. backward, so $T$, then $T-1$, then $T-2$, and so on until $1$. If you know Python, this should be familiar. However, note that this is just a guess because I am not familiar with this algorithm. – nbro – 2020-07-03T13:23:38.600

Yes, this makes sense, and fits the problem. Thank you! – kikram – 2020-07-03T17:41:10.037

No answers