5 Why are lambda returns so rarely used in policy gradients? 2019-01-17T19:27:47.763

4 Is there any difference between reward and return in reinforcement learning? 2020-06-04T03:35:09.387

4 Why is $G_{t+1}$ is replaced with $v_*(S_{t+1})$ in the Bellman optimality equation? 2020-06-04T19:27:43.360

3 Is my interpretation of the return correct? 2018-08-24T19:05:08.083

3 Given specific rewards, how can I calculate the returns for each time step? 2019-03-07T16:59:53.727

3 What is the difference between return and expected return? 2019-06-30T15:12:14.500

3 Shouldn't expected return be calculated for some faraway time in the future $t+n$ instead of current time $t$? 2020-05-03T06:11:06.840

2 Is my understanding of the value function, Q function, policy, reward and return correct? 2020-04-16T01:53:36.553

2 Why is the expected return in Reinforcement Learning (RL) computed as a sum of cumulative rewards? 2020-07-17T01:17:09.320

1 How can the $\lambda$-return be defined recursively? 2019-04-12T08:16:52.143

1 Why does the n-step return being zero result in high variance in off policy n-step TD? 2020-06-13T06:29:21.597

1 How do I calculate the return given the discount factor and a sequence of rewards? 2020-06-29T14:15:25.977