45 What is the Q function and what is the V function in reinforcement learning? 2016-01-18T13:51:25.520

34 What is "experience replay" and what are its benefits? 2017-07-19T04:15:22.443

27 What exactly is bootstrapping in reinforcement learning? 2018-01-22T23:18:32.323

25 Difference between AlphaGo's policy network and value network 2016-03-28T16:40:25.020

17 Formal proof of vanilla policy gradient convergence 2019-06-15T16:58:34.553

15 Why do we normalize the discounted rewards when doing policy gradient reinforcement learning? 2017-07-01T13:59:39.613

13 AlphaGo (and other game programs using reinforcement-learning) without human database 2016-04-10T05:06:57.810

13 Supervised learning vs reinforcement learning for a simple self driving rc car 2016-04-10T17:28:10.910

11 Books on Reinforcement Learning 2015-08-05T05:58:44.543

11 Prioritized Replay, what does Importance Sampling really do? 2018-06-09T12:10:25.467

11 Can Reinforcement learning be applied for time series forecasting? 2018-08-30T07:46:23.740

10 implementing temporal difference in chess 2014-08-23T13:56:43.813

10 Cooperative Reinforcement Learning 2015-07-11T18:04:05.710

9 Why does Q Learning diverge? 2017-08-11T01:11:51.120

9 Is this a Q-learning algorithm or just brute force? 2018-03-10T11:03:06.680

9 What is the difference between active learning and reinforcement learning? 2020-11-13T12:54:30.807

8 How to teach neural network a policy for a board game using reinforcement learning? 2016-01-05T13:28:18.940

8 What knowledge do I need in order to write a simple AI program to play a game? 2017-01-04T13:15:42.680

8 What is the difference between "expected return" and "expected reward" in the context of RL? 2017-12-15T20:35:00.220

8 Q-Learning: Target Network vs Double DQN 2018-05-28T07:22:33.077

8 How does generalised advantage estimation work? 2018-06-01T02:33:57.877

8 Text extraction from documents using NLP or Deep Learning 2018-06-19T16:09:57.667

8 How does Implicit Quantile-Regression Network (IQN) differ from QR-DQN? 2018-11-07T14:57:17.860

8 Which ML approach to choose for the game AI when rewards are delayed? 2020-05-17T11:43:24.780

7 What is the novelty in AlphaGo, Google Deepmind's Go playing system? 2016-01-30T18:41:55.730

7 Why are policy gradient methods preferred over value function approximation in continuous action domains? 2017-11-29T10:47:33.030

7 What is a policy in machine learning? 2018-01-25T01:27:00.413

7 Representing similar states in reinforcement learning? 2018-08-05T21:01:27.617

7 Reinforcement learning: decreasing loss without increasing reward 2018-09-04T12:06:23.757

6 Does reinforcement learning require the help of other learning algorithms? 2015-09-07T08:29:25.507

6 What is Reinforcement Learning? 2016-10-04T08:50:03.107

6 Understanding advantage functions 2016-11-29T12:08:58.323

6 Reward dependent on (state, action) versus (state, action, successor state) 2017-03-25T08:39:13.420

6 Simple Q-Table Learning: Understanding Example Code 2017-09-13T12:44:58.100

6 What is Compatible Function Approximation theorem in reinforcement learning? 2017-11-30T15:42:04.850

6 RL Policy Gradient: How to deal with rewards that are strictly positive? 2018-04-13T17:18:21.183

6 RL Advantage function why A = Q-V instead of A=V-Q? 2018-09-01T03:08:20.420

6 How to choose between discounted reward and average reward? 2019-02-18T12:29:23.290

6 Off-policy n-step learning with DQN 2019-02-26T08:51:33.333

6 Large action space for deep reinforcement learning 2019-04-16T00:32:17.973

6 Why could my DDQN get significantly worse after beating the game repeatedly? 2019-07-20T08:06:54.100

6 Actor-critic architecture: How is the policy updated? 2019-12-02T21:13:47.307

6 Reinforcement Learning: Policy Gradient derivation question 2020-02-17T15:00:23.910

5 What are the advantages / disadvantages of off-policy RL vs on-policy RL? 2016-07-27T14:35:15.043

5 What is "Policy Collapse" and what are the causes? 2017-08-13T01:05:18.227

5 Is feature scaling necessary in reinforcement learning for the agent to learn successfully? 2017-08-22T12:52:49.833

5 Why random sample from replay for DQN? 2017-11-19T15:25:01.673

5 RL's policy gradient (REINFORCE) pipeline clarification 2018-09-19T17:18:30.447

5 what is difference between the DDQN and DQN? 2018-09-22T05:19:54.870

5 Why a Random Reward in One-step Dynamics MDP? 2019-03-16T21:59:49.053

4 Parallel Q-learning 2016-01-14T20:18:07.897

4 Information extraction with reinforcement learning, feasible? 2016-03-12T20:43:03.863

4 Reinforcement learning for continuous (rather than discrete) actions 2017-07-18T18:36:17.073

4 Has the Random Forest algorithm ever been used in Reinforcement Learning applications? 2017-08-14T22:02:23.303

4 Reinforcement Learning different patients 2017-11-01T09:31:01.957

4 What is a minimal setup to solve the CartPole-v0 with DQN? 2017-11-09T08:14:57.000

4 How can RL agents be monitored? 2017-12-06T08:58:46.353

4 How differential semi-gradient Sarsa updates estimated average reward? 2017-12-15T06:09:44.720

4 Is reseating passengers a reinforcement learning problem? 2017-12-19T07:42:15.533

4 Prioritized Experience Replay - why to approximate the Density Function? 2018-05-30T23:32:24.470

4 selecting a number of neurons specifically for RL 2018-06-10T23:38:33.563

4 Hindsight Experience Replay, how to define a partially-known End-Goal 2018-07-02T13:55:18.877

4 Confusion about neural network architecture for the actor critic reinforcement learning algorithm 2018-07-20T20:23:19.883

4 Reinforcement learning: Discounting rewards in the REINFORCE algorithm 2018-09-13T12:27:58.977

4 Can Reinforcement learning be applied in image classification? 2018-12-19T04:56:58.217

4 Reinforcement learning for continuous state and action space 2019-01-05T15:18:23.477

4 Reinforcement learning with sparse acting agent 2019-12-30T20:08:04.913

4 Q learning for blackjack, reward function? 2020-01-31T00:36:06.083

4 Hindsight Experience Replay (HER) results obtained 50 times faster than original paper? 2020-02-16T18:14:21.767

4 How much features are needed for Reinforcement learning? 2020-05-17T00:23:05.450

3 How can I model open environment in reinforcement learning? 2014-08-21T10:13:54.130

3 learning rate in reinforcement learning 2015-06-25T19:12:04.583

3 When to stop calculating values of each cell in the grid in Reinforcement Learning(dynamic programming) applied on gridworld 2015-08-05T10:27:51.370

3 Value Updation Dynamic Programming Reinforcement learning 2015-08-07T04:31:06.123

3 Is there some model-based variation of the Q-Learning algorithm which learns on a 3D SxAxS' table instead of a 2D SxA table? 2016-04-20T18:14:21.247

3 Reinforcement learning: understanding this derivation of n-step Tree Backup algorithm 2016-12-21T19:43:06.103

3 What is significance of Colour-digit MNIST game in paper Learning to Communicate with Deep Multi-Agent Reinforcement Learning? 2017-03-05T23:45:40.880

3 Understanding the training phase of the tutorial "Using Keras and Deep Deterministic Policy Gradient to play TORCS" tutorial 2017-05-01T02:28:06.313

3 Neural Network Learning Rate vs Q-Learning Learning Rate 2017-08-11T13:41:33.153

3 Catastrophic forgetting in linear semi-gradient RL agent? 2017-08-16T17:05:53.190

3 How to transition between offline and online learning? 2017-09-20T01:56:33.853

3 Can Reinforcement Learning work for Dutch auctions? 2017-11-06T18:25:24.187

3 How to design two different neural nets for actor and critic RL? 2017-12-05T11:39:15.193

3 How is that possible that a reward function depends both on the next state and an action from current state? 2017-12-16T00:18:41.067

3 Is my understanding of On-Policy and Off-Policy TD algorithms correct? 2018-01-10T11:03:39.813

3 Graphical results of Q-Learning: is improvement possible by parameter tweaking? 2018-01-12T15:18:33.850

3 Reinforcement Learning on data only (NO emulators) 2018-01-31T20:28:47.240

3 Why Deep Reinforcement Learning fails to learn how to play Asteroids? 2018-02-16T13:08:43.983

3 What is the optimal value of a Markov Decision process with Single actions at each state? 2018-04-13T20:44:33.227

3 Choosing a right algorithm for template-based text generation 2018-05-22T10:40:11.760

3 Supervised Learning could be biased if we use obsolete data 2018-05-23T15:27:37.527

3 Defining State Representation in Deep Q-Learning 2018-05-24T15:22:19.610

3 Deep advantage learning: how to predict the value 2018-06-07T23:05:04.387

3 Rainbow vs A3C ...too unfair? 2018-06-18T22:58:09.063

3 Dueling DQN - can't understand its mechanism 2018-07-06T01:36:54.790

3 Policy Gradients vs Value function, when implemented via DQN 2018-07-18T07:09:18.330

3 Handling actions with delayed effect (Reinforcement learning) 2018-07-18T07:19:27.840

3 Auto-Encoder to condense (pre-process) large one-hot input vectors? 2018-08-06T23:38:53.977

3 Difference between advantages of Experience Replay in DQN2013 paper 2018-08-14T05:42:40.960