43 What's the difference between model-free and model-based reinforcement learning? 2017-11-07T14:10:09.410

26 How to handle invalid moves in reinforcement learning? 2017-03-14T14:26:07.403

25 What is the relation between Q-learning and policy gradients methods? 2018-04-28T03:11:16.087

18 What is sample efficiency, and how can importance sampling be used to achieve it? 2018-02-07T11:20:37.190

15 Why doesn't Q-learning converge when using function approximation? 2019-04-05T18:23:46.233

14 How does LSTM in deep reinforcement learning differ from experience replay? 2018-08-27T01:58:20.250

14 How to define states in reinforcement learning? 2018-08-30T23:45:20.763

14 Inconsistent action space in Reinforcement Learning 2018-12-12T13:27:58.903

13 Are there any applications of reinforcement learning other than games? 2017-06-16T04:10:16.480

13 Why does DQN require two different networks? 2018-07-02T07:47:23.303

13 Why does the discount rate in the REINFORCE algorithm appear twice? 2018-08-22T18:06:49.643

13 How to implement a constrained action space in reinforcement learning? 2018-08-29T16:04:16.113

13 Why do you not see dropout layers on reinforcement learning examples? 2018-10-07T09:55:50.223

13 When should I use Reinforcement Learning vs PID Control? 2019-05-22T15:24:37.517

12 What does "stationary" mean in the context of reinforcement learning? 2018-08-20T10:09:40.053

12 What is the Bellman operator in reinforcement learning? 2019-03-06T14:07:16.067

11 How can policy gradients be applied in the case of multiple continuous actions? 2017-09-21T08:27:28.160

11 What is the difference between actor-critic and advantage actor-critic? 2018-08-02T14:59:08.493

11 Is the optimal policy always stochastic if the environment is also stochastic? 2019-02-15T13:20:50.677

11 How to stay a up-to-date researcher in ML/RL community? 2019-07-18T11:54:50.357

10 Can a neural network work out the concept of distance? 2018-04-18T12:01:49.907

10 What is a "trajectory" in reinforcement learning? 2018-07-31T14:34:31.353

10 Why does the policy network in AlphaZero work? 2018-09-14T20:21:27.440

9 A few doubts regarding the application of reinforcement learning to games like chess 2017-11-10T15:54:48.073

9 Do off-policy policy gradient methods exist? 2017-12-23T18:41:29.570

9 What is the difference between expected return and value function? 2018-03-17T17:00:51.140

9 What is the difference between an observation and a state in reinforcement learning? 2018-04-09T21:02:23.180

9 Does Monte Carlo tree search qualify as machine learning? 2018-08-16T02:13:15.483

9 Why is baseline conditional on state at some timestep unbiased? 2018-09-09T20:31:07.373

9 How do we prove the n-step return error reduction property? 2018-12-08T05:24:56.380

9 Are there reinforcement learning algorithms that scale to large problems? 2019-04-14T13:04:26.620

9 What is the credit assignment problem? 2019-06-18T00:35:55.820

8 What are some resources on continuous state and action spaces MDPs for reinforcement learning? 2016-08-24T10:00:44.473

8 What are different actions in action space of environment of 'Pong-v0' game from openai gym? 2016-12-10T13:44:06.017

8 Are there any other machine learning models apart from Reinforcement Learning and Q Learning to play video games? 2017-01-25T18:12:43.350

8 What's a good resource for getting familiar with reinforcement learning? 2018-07-03T13:43:24.843

8 Reinforcement Learning with asynchronous feedback 2018-07-30T05:33:15.530

8 What does the agent in reinforcement learning exactly do? 2018-10-17T07:42:21.090

8 Why does is make sense to normalize rewards per episode in reinforcement learning? 2019-01-24T13:56:08.333

8 Huge action space size in Reinforcement Learning 2019-03-06T13:23:32.007

8 How can alpha zero learn if the tree search stops and restarts before finishing a game? 2019-04-12T11:42:10.900

8 Why is reinforcement learning not the answer to AGI? 2019-12-13T18:53:53.337

8 Are Q-learning and SARSA the same when action selection is greedy? 2020-05-10T10:52:49.730

7 Negative reward (penalty) in policy gradient reinforcement learning 2016-11-29T06:10:54.993

7 What are different approaches used in Machine Learning? 2017-12-10T19:29:25.527

7 How do I create an AI for a two-players board game? 2017-12-14T12:39:33.200

7 How does Q-learning work in stochastic environments? 2018-03-29T09:57:12.370

7 Reinforcement Learning in Commercial Strategy Games 2018-06-14T02:44:04.580

7 What is the relationship between these two taxonomies for machine learning with neural networks? 2018-07-24T16:31:48.617

7 Is there a difference in the architecture of deep reinforcement learning when multiple actions are performed instead of a single action? 2018-08-24T21:11:06.363

7 Is Experience Replay like dreaming? 2018-09-09T19:07:09.927

7 What is the difference between First-Visit Monte-Carlo and Every-Visit Monte-Carlo Policy Evaluation? 2019-02-22T09:28:24.537

7 Why exactly do neural networks require i.i.d. data? 2019-02-23T13:30:07.443

7 2 Player Games in OpenAI Retro 2019-03-12T16:30:55.220

7 What is the difference between reinforcement learning and optimal control? 2019-03-22T10:57:57.820

7 Is Q-Learning suitable for continous (state or action) spaces? 2019-05-11T11:11:43.090

7 What is the purpose of the actor in actor-critic algorithms? 2019-05-18T23:07:43.073

7 Are there any online competitions for Reinforcement Learning? 2019-11-18T16:03:26.940

7 Why does reinforcement learning using a non-linear function approximator diverge when using strongly correlated data as input? 2020-01-29T08:47:11.317

7 Why state-action value function as an expected value of the return and state value function, does not need to follow policy? 2020-06-06T08:55:32.493

6 Is it possible to implement reinforcement learning using a neural network? 2016-08-02T16:19:30.337

6 Board/Card Game AI - Questions concerning state/action space - Deep Reinforcement Learning 2016-10-26T08:11:31.073

6 A solution for a famous problem in RL 2017-02-11T11:22:59.897

6 OpenAI Baselines DQN - handling of invalid actions 2017-05-29T09:02:28.270

6 How should I handle action selection in the terminal state when implementing SARSA? 2017-08-04T13:45:54.123

6 Why do Bellman equations indirectly create a policy? 2017-12-18T13:27:20.397

6 How to implement exploration function and learning rate in Q Learning 2018-02-16T05:12:24.747

6 What is experience replay in laymen's terms? 2018-05-30T19:09:05.100

6 Issue with simple game AI 2018-06-11T19:55:35.770

6 Why is the log probability replaced with the importance sampling in the loss function? 2018-08-23T07:17:42.697

6 Learning Rate Decay and Exploration Rate Decay 2018-09-11T10:47:29.857

6 Q-Learning the generic maze solution 2018-10-17T12:00:49.070

6 How do I compute the variance of the return of an evaluation policy using two behaviour policies? 2019-01-17T19:45:15.953

6 Is reinforcement learning using shallow neural networks still deep reinforcement learning? 2019-03-30T05:31:04.133

6 Can Q-learning be used in a POMDP? 2019-04-03T02:40:29.227

6 What algorithms are considered reinforcement learning algorithms? 2019-04-30T14:26:46.283

6 Does AlphaZero use Q-Learning? 2019-07-01T17:02:00.180

6 What does the symbol $\mathbb E$ mean in these equations? 2019-08-27T14:46:48.670

6 Benchmarks for reinforcement learning in discrete MDPs 2019-09-01T18:11:42.990

6 Counterexamples to the reward hypothesis 2019-11-19T02:30:52.573

6 Why cannot an AI agent adjust the reward function directly? 2019-12-14T08:07:26.960

6 Is there any programming language designed by deep learning? 2020-03-29T04:58:07.263

6 Is this proof of $\epsilon$-greedy policy improvement correct? 2020-05-27T12:44:33.367

6 How to measure sample efficiency of a reinforcement learning algorithm? 2020-06-18T09:29:59.673

5 What is the current state-of-the-art in Reinforcement Learning regarding data efficiency? 2016-08-06T17:35:02.143

5 Is reinforcement learning needed to create Strong AI? 2016-08-08T23:46:12.853

5 Can reinforcement learning algorithms be applied to computer vision problems? 2017-07-30T10:33:49.783

5 Did the Facebook robots both want everything but the balls? 2017-08-07T08:40:35.017

5 Inconsistency in TD-Leaf algorithm in KnightCap chess engine 2017-09-28T17:19:47.250

5 Where to publish reasonable article in Deep Reinforcement Learning? 2017-11-07T09:02:44.327

5 Why is the target $r + \gamma \max_{a'} Q(s', a'; \theta_i^-)$ in the loss function of the DQN architecture? 2017-12-13T19:25:59.450

5 Neural network for data visualization 2018-01-12T13:05:47.300

5 What is a weighted average in a non-stationary k-armed bandit problem? 2018-01-18T18:32:34.333

5 Move blocks to create a designed surface 2018-01-23T05:02:16.247

5 Reinforcement Learning (Fitted Q): Qn on Concept & Implementation 2018-08-07T20:00:59.637

5 Is the discount not needed in a deterministic environment for Reinforcement Learning? 2018-08-15T16:58:56.453

5 How can a reinforcement learning agent generalize if it is trained against only one opponent? 2018-09-14T03:38:46.363

5 What is the appropriate approach to playing a game with incomplete state information? 2018-12-23T10:44:26.103

5 How can the importance-sampling ratio be different than zero? 2019-01-09T17:28:06.877