Tag: q-learning

34 What is "experience replay" and what are its benefits? 2017-07-19T04:15:22.443

9 Understanding Reinforcement Learning with Neural Net (Q-learning) 2016-02-18T10:11:23.997

9 Why does Q Learning diverge? 2017-08-11T01:11:51.120

9 Is this a Q-learning algorithm or just brute force? 2018-03-10T11:03:06.680

8 How to teach neural network a policy for a board game using reinforcement learning? 2016-01-05T13:28:18.940

7 Representing similar states in reinforcement learning? 2018-08-05T21:01:27.617

7 Reinforcement learning: decreasing loss without increasing reward 2018-09-04T12:06:23.757

6 Understanding advantage functions 2016-11-29T12:08:58.323

6 Keras input dimension bug? 2017-08-01T16:54:59.527

6 Simple Q-Table Learning: Understanding Example Code 2017-09-13T12:44:58.100

6 RL Advantage function why A = Q-V instead of A=V-Q? 2018-09-01T03:08:20.420

6 Why could my DDQN get significantly worse after beating the game repeatedly? 2019-07-20T08:06:54.100

5 What are the advantages / disadvantages of off-policy RL vs on-policy RL? 2016-07-27T14:35:15.043

5 Why random sample from replay for DQN? 2017-11-19T15:25:01.673

5 Q learning - how to use experience replay, when playing against other agent? 2018-02-01T15:58:12.597

4 Parallel Q-learning 2016-01-14T20:18:07.897

4 Multiple Output Layers in Neural Networks in Deep Q Learning 2017-03-16T15:06:55.983

4 Prioritized Experience Replay - why to approximate the Density Function? 2018-05-30T23:32:24.470

3 Is there some model-based variation of the Q-Learning algorithm which learns on a 3D SxAxS' table instead of a 2D SxA table? 2016-04-20T18:14:21.247

3 Neural Network Learning Rate vs Q-Learning Learning Rate 2017-08-11T13:41:33.153

3 Clamping Q function to it's theoretical maximum, yes or no? 2017-11-11T15:12:32.687

3 Is my understanding of On-Policy and Off-Policy TD algorithms correct? 2018-01-10T11:03:39.813

3 Graphical results of Q-Learning: is improvement possible by parameter tweaking? 2018-01-12T15:18:33.850

3 Neural network q learning for tic tac toe - how to use the threshold 2018-01-13T16:25:10.343

3 Reinforcement Learning on data only (NO emulators) 2018-01-31T20:28:47.240

3 What is the optimal value of a Markov Decision process with Single actions at each state? 2018-04-13T20:44:33.227

3 Why does Q-learning use an actor model and critic model? 2018-05-10T10:14:04.247

3 DQN fails to find optimal policy 2019-04-01T01:23:54.043

2 Initial Q-values in Q-Learning 2016-10-06T04:06:41.703

2 Q learning and Neural Network for Tic Tac Toe 2017-11-09T04:05:19.887

2 Negative Rewards and Activation Functions 2017-12-26T14:37:21.760

2 Q-learning why do we subtract the Q(s, a) term during update? 2018-01-29T03:02:05.057

2 Experience Replay, must return minibatch back to Memory Bank? 2018-04-25T02:13:26.050

2 Dueling DQN what does a' mean? 2018-06-04T09:22:52.780

2 Calculate Q parameter for Deep Q-Learning applied to videogames 2018-07-22T13:46:49.623

2 How we can have RF-QLearning or SVR-QLearning (Combine these algorithm with a Q-Learning ) 2018-08-31T18:01:04.000

2 What's going wrong with my Tic Tac Toe Q-Learning Alghoritm? 2018-10-12T20:11:55.310

2 IndexError: index 804 is out of bounds for axis 0 with size 800 2018-11-05T15:05:34.713

2 How does Q-Learning deal with mixed strategies? 2018-12-20T17:48:32.353

2 What is the difference between dynamic programming and Q-learning? 2019-01-21T05:52:55.270

2 Representing state in Q-Learning 2019-05-04T09:09:16.683

2 If the set of all possible states changes each time, how can Q-learning "learn" anything? 2019-05-04T21:40:19.283

2 Reinforcement Learning using PPO2 in openai gym retro, mario not learning the clear the easy episode 2019-05-11T10:46:12.277

2 Difference between Dueling DQN and Double DQN? 2019-05-31T17:46:24.383

2 Incentivizing curiosity in a sparse reward environment 2019-12-14T19:07:28.870

2 Would Deep Q Learning work for a finite horizon problem? 2019-12-26T21:32:42.887

2 Deep Q Network gives same Q values and doesn't improve 2019-12-30T22:10:01.297

2 Markov Decision Process representation 2020-02-09T15:07:37.237

2 Reference implementation of q-learning in Python 2020-09-05T16:15:17.653

2 Offline/Batch Reinforcement Learning: Doubly Robust Off-policy Estimator takes huge values 2020-12-04T14:41:00.917

2 Does convergence equal learning in Deep Q-learning? 2021-02-21T16:00:58.187

1 Keras not converging to optimum while TensorFlow does 2017-08-04T20:54:57.027

1 Isn't the optimizer network in deepminds learning to learn a DRQN? 2017-12-02T17:34:44.223

1 Q Learning Neural network for tic tac toe Input implementation problem 2018-01-12T11:55:52.810

1 Q learning Neural network Tic tac toe - When to train net 2018-01-13T23:04:34.337

1 Simple Q-learning neural network using numpy 2018-01-29T11:14:10.953

1 How does a Q algorithm consider future rewards? 2018-04-05T12:14:41.737

1 Reinforcement Learning with static state 2018-04-05T14:30:42.107

1 Policy gradient on data only, without emulators 2018-04-13T10:48:43.583

1 Experience Replay Explain 2018-04-24T17:22:03.303

1 Adding a bias makes Q-learning algorithm ineffective 2018-05-23T02:12:16.930

1 Tflearn "nan" weight matrices 2018-06-30T17:12:09.613

1 Deep Q-Learning with large number of actions 2018-08-16T06:04:06.070

1 What is the immediate reward in value iteration? 2018-10-08T14:38:01.043

1 Dueling DQN - Calculation of Q-value 2018-10-19T15:53:34.447

1 Will reinforcement learning work if states wont get repeated again? 2018-10-24T11:57:27.373

1 Reinforcement learning: negative reward (punish) illegal actions? 2018-12-02T10:04:56.070

1 Why not use max(returns) instead of average(returns) in off-policy Monte Carlo control? 2018-12-20T10:52:59.883

1 How to represent an image as state in a Q-table 2018-12-31T18:22:21.137

1 What is the meaning of the Variant Q-learning and To what INPUT and OUTPUT refer? in Abstract of DeepMind DQN paper 2013 2019-01-22T10:36:39.523

1 Q learning transition matrix trouble 2019-01-30T01:14:19.670

1 Is reward accumulated during a play iteration when performing SARSA? 2019-03-29T01:44:39.513

1 Intuition behind the loss function in Deep Q learning? 2019-05-05T12:34:36.813

1 Alternative approach for Q-Learning 2019-06-22T08:23:30.753

1 Q table creation and update for dynamic action space 2019-07-16T14:41:40.783

1 DQN - target values vs action values? 2019-07-21T17:42:25.177

1 Why can't Policy Gradient Algorithm be seen as an Actor-Critic Method? 2019-07-22T09:54:45.860

1 Q-learning when minimising a total cost instead of maximising a total reward 2019-07-30T12:14:59.653

1 Keras high loss and high accuracy in gk bot with reinforcement learning? 2019-09-05T17:40:12.937

1 How to formulate reward of an rl agent with two objectives 2019-09-17T08:36:45.210

1 Reducing the training time of an RL agent 2019-09-19T08:34:33.857

1 Deep Q Learning - training slows down significantly 2019-11-29T02:51:24.390

1 help understanding deep Q learning algorithm from deep mind paper 2020-02-03T15:03:28.370

1 Q-learning, state transition, immediate rewards (trading logic) 2020-03-16T05:53:50.867

1 Reward(t) vs. Reward(t+1) ? Reinforcement Learning, Q-learning 2020-03-19T01:59:30.927

1 Wich activation function for DQL 2020-06-17T12:26:04.577

1 Definition of the Q* function in reinforcement learning 2020-10-10T22:37:49.277

0 Choosing the right parameters for SARSA and Q-Learning & Comparing Models 2017-04-20T16:08:32.963

0 Q-learning with a state-action-state reward structure and a Q-matrix with states as rows and actions as columns 2017-07-29T14:23:37.060

0 Can you interpolate with QLearning or Reinforcement learning in general? 2018-04-18T08:42:01.380

0 Why is "next state" kept in RL experience replay? 2019-01-23T18:03:18.010

0 Openai Spaces for a modified environment 2019-03-17T16:29:23.113

0 Q-Learning experience replay: how to feed the neural network? 2019-04-11T14:13:00.583

0 Deep Q-Learning for physical quantity: q-values distribution not as expected 2019-07-29T09:24:56.840

0 How is the target_f updated in the Keras solution to the Deep Q-learning Cartpole/Gym algorithm? 2020-02-01T13:17:18.950

0 find the parameter of model with Q learning 2020-03-02T17:44:29.373

0 Index tensor must have same dimensions as input tensor 2020-03-17T20:29:11.783

0 How are n dimensional vectors state vectors represented in Q Learning? 2020-04-14T19:46:55.360

0 Reward engineering to replace single terminal reward (exponential utility of terminal wealth) 2020-04-21T11:06:45.083