Tag: deep-rl

15 Why doesn't Q-learning converge when using function approximation? 2019-04-05T18:23:46.233

14 How does LSTM in deep reinforcement learning differ from experience replay? 2018-08-27T01:58:20.250

7 Why does reinforcement learning using a non-linear function approximator diverge when using strongly correlated data as input? 2020-01-29T08:47:11.317

6 Is it possible to implement reinforcement learning using a neural network? 2016-08-02T16:19:30.337

6 What is experience replay in laymen's terms? 2018-05-30T19:09:05.100

6 Why is the log probability replaced with the importance sampling in the loss function? 2018-08-23T07:17:42.697

6 Is reinforcement learning using shallow neural networks still deep reinforcement learning? 2019-03-30T05:31:04.133

5 Can TD($\lambda$) be used with deep reinforcement learning? 2019-02-02T17:30:53.470

5 What is the difference between DQN and AlphaGo Zero? 2019-02-27T06:17:02.450

5 Is there an alternative to the use of target network? 2019-02-27T09:25:53.430

5 Why don't people use projected Bellman error with deep neural networks? 2019-04-12T05:02:52.887

5 What are some online courses for deep reinforcement learning? 2020-03-25T14:46:24.230

5 Is it possible to guide a reinforcement learning algorithm? 2020-04-18T12:42:55.083

4 Do we have to use CNN for Deep Q Learning? 2019-03-14T05:49:49.220

4 How did the OpenAI 5 for Dota concatenate units? 2019-03-25T09:19:38.147

4 What could be causing the drastic performance drop of the DQN model on the Pong environment? 2019-05-31T20:09:58.620

4 How does policy evaluation work for continuous state space model-free approaches? 2020-02-19T02:26:03.630

4 What is the target Q-value in DQNs? 2020-04-19T03:25:51.150

4 Why AlphaGo didn't use Deep Q-Learning? 2020-04-24T01:56:01.473

4 How can a single sample represent the expectation in gradient temporal difference learning? 2020-04-26T09:37:48.353

4 How does the repetition of features across states at different time steps affect learning? 2020-05-18T21:19:27.980

4 Is there any good reference for double deep Q-learning? 2020-05-28T15:55:49.123

4 How does the Ornstein-Uhlenbeck process work, and how it is used in DDPG? 2020-08-21T20:00:04.873

3 Understanding multi iteration update of model in Policy Gradient PPO algorithm 2018-03-04T10:38:58.867

3 Maximizing or Minimizing in Trust Region Policy Optimization? 2018-07-15T08:43:31.690

3 My DQN is stuck and can't see where the problem is 2019-02-22T20:55:03.887

3 What are the differences between the DQN variants? 2019-03-23T12:38:31.063

3 Why Q2 is a more or less independant estimate in Twin Delayed DDPG (TD3)? 2019-03-24T05:26:49.420

3 How large should the replay buffer be? 2019-04-04T14:40:34.553

3 What can be considered a deep recurrent neural network? 2019-04-08T10:04:21.293

3 Deep Q-Learning agent poor performing actions. Need help optimizing 2019-04-11T23:18:09.190

3 DQN Agent not learning anymore - what can I do to fix this? 2019-04-22T09:00:45.757

3 beautify an image with reinforcement learning 2019-05-27T11:49:55.327

3 Training a reinforcement learning model with multiple images 2019-05-28T14:44:04.090

3 Why do authors track $\gamma_t$ in Prioritized Experience Replay Paper? 2019-05-31T02:47:46.293

3 Deep Q-Network (DQN) to learn the game 2048 2019-06-12T21:17:30.437

3 How is the gradient of the loss function in DQN derived? 2019-09-07T14:18:13.677

3 What could be the cause of the drop in the reward in A3C? 2019-10-28T07:47:59.513

3 Purpose of using actor-critic algorithms under deterministic MDP dynamics? 2019-11-12T14:25:44.363

3 How to deal with nonstationary rewards in asymmetric self-play reinforcement learning? 2019-12-24T19:26:02.923

3 Optimal RL function approximation for TicTacToe game 2020-01-09T20:57:16.390

3 In the policy gradient equation, is $\pi(a_{t} | s_{t}, \theta)$ a distribution or a function? 2020-02-21T16:23:15.443

3 Can experience replay be used for training after completing every single epoch? 2020-03-06T06:58:21.853

3 How does normalization of the inputs work in the context of PPO? 2020-04-11T08:54:32.807

3 Are Q values estimated from a DQN different from a duelling DQN with the same number of layers and filters? 2020-04-13T03:46:47.127

3 Understanding the TensorFlow implementation of the policy gradient method 2020-04-30T14:45:32.223

3 If agent chooses an action that the environment can't operate, how should I handle this situation? 2020-05-19T03:39:50.190

3 How to take actions at each episode and within each step of the episode in deep Q learning? 2020-06-05T20:32:22.853

3 Can AlphaZero considered as Multi-Agent Deep Reinforcement Learning? 2020-08-02T13:02:45.957

3 Is there a logical method of deducing an optimal batch size when training a Deep Q-learning agent with experience replay? 2020-08-25T22:16:33.010

3 How can I fix jerky movement in a continuous action space 2020-08-29T14:09:13.053

2 Why use semi-gradient instead of full gradient in RL problems, when using function approximation? 2018-04-24T23:11:25.637

2 Understanding lemma 2 of the "Trust Region Policy Optimization" paper 2018-11-27T16:52:12.273

2 Is there a relation between the size of the neural networks and speed of convergence in deep reinforcement learning? 2019-01-28T10:16:21.447

2 Why does Deep Q Network outputs multiple Q values? 2019-02-19T09:14:56.147

2 In DQN, updating target network every N steps or slowly update every step is better? 2019-02-28T03:56:42.890

2 Is there a way to train an RL agent without any environment? 2019-03-06T10:41:05.043

2 Regarding the output layer's activation function for continuous action space problems 2019-03-25T11:22:41.433

2 Why experience reply memory in DQN instead of a RNN memory? 2019-04-22T12:00:35.690

2 Why overfitting is bad in DQN? 2019-04-30T15:40:31.133

2 How does the TRPO surrogate loss account for the error in the policy? 2019-05-02T15:31:08.017

2 New transition priorities in Prioritized Experience Replay? 2019-06-01T02:26:03.443

2 Reward does not increase for a maze escaping problem with DQN 2019-06-01T14:53:15.223

2 If deep Q learning involves adjusting the value function for a specific policy, then how do I choose the right policy? 2019-07-13T21:48:24.443

2 Will the target network, which is less trained than the normal network, output inferior estimates? 2019-07-20T03:17:34.007

2 Torch CNN not training 2019-08-24T16:35:56.727

2 When does AlphaZero play suboptimal moves? 2019-08-27T17:54:19.253

2 Doubt in Deep-Q learning with sparse rewards 2019-10-15T16:04:31.937

2 Immediate reward received in Atari game using DQN 2020-02-11T16:20:54.547

2 Can recovering a reward function using IRL lead to better policies compared to reward shaping? 2020-02-13T06:04:06.380

2 How does adding noise to the action in DDPG help in learning? 2020-02-23T13:25:19.680

2 Monte Carlo updates on policy gradient with no terminal state 2020-02-27T00:31:05.983

2 Unexpected results when comparing a greedy policy to a DQN policy 2020-03-04T12:39:59.857

2 How to correctly implement self-play with DQN? 2020-03-17T12:49:49.053

2 Which deep reinforcement learning algorithm is appropriate for my problem? 2020-03-30T08:17:39.780

2 How much time does it take to train DQN on Atari environment? 2020-04-01T09:23:18.910

2 How was the DQN trained to play many games? 2020-04-04T13:54:05.900

2 What are the most common deep reinforcement learning algorithms and models apart from DQN? 2020-04-10T23:22:04.970

2 Why is DDPG an off-policy RL algorithm? 2020-04-29T15:27:42.730

2 What should the target be when the neural network outputs multiple Q values in deep Q-learning? 2020-05-04T02:41:33.243

2 Does the concept of validation loss apply to training deep Q networks? 2020-05-18T12:54:58.817

2 How should I decay $\epsilon$ in Q-learning? 2020-05-28T11:18:14.543

2 Are the final states not being updated in this $n$-step Q-Learning algorithm? 2020-06-02T14:10:10.190

2 How to train a reinforcement learning agent from raw pixels? 2020-06-05T19:10:42.870

2 If the minimum Q value is decreasing and the maximum Q value increasing, is this a sign that dueling double DQN is diverging? 2020-06-07T16:24:40.417

2 Two DQNs in two different time scales 2020-06-20T03:37:41.063

2 What is the bias-variance trade-off in reinforcement learning? 2020-06-23T16:41:36.270

2 Should illegal moves be excluded from loss calculation in DQN algorithm? 2020-06-27T19:02:10.683

2 Why do some DQN implementations not require random exploration but instead emulate all actions? 2020-07-05T09:25:12.933

2 How does the target network in double DQNs find the maximum Q* value for each action? 2020-07-21T14:20:27.200

2 How we are calculating average reward ($r(\pi)$) if the policy changes over time? 2020-08-27T20:29:35.593

1 How do I calculate $max_{a′}Q(s′,a′,w−)$ when it is represented as a neural network? 2019-01-05T11:08:11.157

1 Deep Q-learning is not performing well when there are several enemies 2019-02-10T10:22:22.583

1 Comparison and understanding of different version of DDQN? 2019-03-14T12:52:07.787

1 What are a list of board game environments for RL practice? 2019-05-02T02:14:29.780

1 Do we need to use the experience replay buffer with the A3C algorithm? 2019-05-02T10:25:44.270

1 Can next state and action be same in Deep Deterministic Policy Gradient? 2019-06-11T08:18:25.203

1 Why are we using all hyperparameters in RL? 2019-06-13T08:10:58.897

1 Reinforcement Learning State Definition 2019-06-22T05:53:22.077