13 Why does the discount rate in the REINFORCE algorithm appear twice? 2018-08-22T18:06:49.643

9 How do we prove the n-step return error reduction property? 2018-12-08T05:24:56.380

7 What is the difference between reinforcement learning and optimal control? 2019-03-22T10:57:57.820

6 Counterexamples to the reward hypothesis 2019-11-19T02:30:52.573

5 How can the importance-sampling ratio be different than zero? 2019-01-09T17:28:06.877

4 Expected SARSA vs SARSA in "RL: An Introduction" 2019-02-21T19:55:43.213

4 How is the policy gradient calculated in REINFORCE? 2019-04-21T19:23:33.580

3 When does backward propagation occur in n-step SARSA? 2017-02-15T15:39:15.997

3 Is my interpretation of the return correct? 2018-08-24T19:05:08.083

3 What are the value functions used in reinforcement learning? 2019-02-14T17:03:57.140

3 Difference in continuing and episodic cases in Sutton and Barto - Introduction to RL, exercise 3.5 2019-03-07T15:52:59.950

3 Understanding the n-step off-policy SARSA update 2019-04-05T14:23:21.970

3 Why is an average of all returns used to update the value in the first-visit MC control? 2019-06-06T16:45:36.437

3 Sutton & Barto's notation $V_{t+n}$ in Chapter 7: $n$-step Bootstrapping 2019-11-07T01:02:53.640

2 Understanding the notation in the definition of the expected reward 2018-10-18T10:38:39.577

2 What is the meaning of Model(s, a) in the prioritized sweeping algorithm? 2019-01-09T09:12:34.027

2 Possible inconsistency in the Policy Improvement equation 2019-05-11T06:38:06.143

2 Hashed Tile Coding vs Regular Tile Coding 2019-05-22T07:36:30.033

2 How do we get the true value in the prediction objective in reinforcement learning? 2019-05-28T16:40:29.937

2 On-policy state distribution for episodic tasks on Sutton & Barto, page 199 2019-11-19T03:32:49.120

2 What does the figure "Blackjack Value Function..." from Sutton represent? 2020-02-03T06:20:26.880

2 Doubt regarding the proof of convergence of $\epsilon$ soft policies without exploring starts 2020-05-04T12:31:03.803

1 How do I apply the value iteration algorithm when there are two goal states? 2019-02-19T19:46:09.030

1 How can the $\lambda$-return be defined recursively? 2019-04-12T08:16:52.143

1 Doubt regarding improvement of State Value by n-step returns 2019-07-04T19:54:36.840

1 Value Iteration failing to converge to optimal value function in Sutton-Barto's Gambler problem 2020-09-05T07:56:55.750