Tag: reinforce

13 Why does the discount rate in the REINFORCE algorithm appear twice? 2018-08-22T18:06:49.643

8 Why does is make sense to normalize rewards per episode in reinforcement learning? 2019-01-24T13:56:08.333

5 Why are lambda returns so rarely used in policy gradients? 2019-01-17T19:27:47.763

4 How is the policy gradient calculated in REINFORCE? 2019-04-21T19:23:33.580

4 REINFORCE algorithm for portfolio optimization - problem while training 2019-10-22T12:24:55.860

4 How to calculate the advantage in policy gradient functions? 2020-03-17T08:49:36.980

3 How is equation 8 derived in the paper "Self-critical sequence training for image captioning"? 2019-02-21T11:22:08.810

3 How is REINFORCE used instead of Backpropagation? 2019-08-04T22:33:07.360

3 Policy gradient methods for continuous action space 2019-08-07T16:23:08.983

3 Confusion about temporal difference learning 2019-10-21T17:48:08.497

3 Update in REINFORCE algorithm - step-wise or episode-wise? 2019-10-29T07:12:03.703

3 Purpose of using actor-critic algorithms under deterministic MDP dynamics? 2019-11-12T14:25:44.363

3 Understanding the TensorFlow implementation of the policy gradient method 2020-04-30T14:45:32.223

3 Why does REINFORCE work at all? 2020-08-15T12:30:38.393

2 What is the difference between Sutton's and Levine's REINFORCE algorithm? 2020-01-07T22:47:17.180

2 How can I sample the output distribution multiple times when pruning the filters with reinforcement learning? 2020-04-26T01:16:07.933

2 Can a typical supervised learning problem be solved with reinforcement learning methods? 2020-05-05T08:16:11.400

2 How long should the state-dependent baseline for policy gradient methods be trained at each iteration? 2020-05-08T11:15:34.553

1 What is the use of the seed function in the gym Environment 'Pendulum-v0'? 2019-02-26T08:07:03.373

1 Policy gradient loss for neural network training 2019-04-20T20:36:44.257

1 How is computed the gradient with respect to each output node from a loss value? 2019-08-05T19:00:56.817

1 How does the policy gradient's derivative work? 2019-11-08T02:31:39.923

1 Is there a good and easy paper to code policy gradient algorithms (REINFORCE) from scratch? 2020-04-19T01:34:40.647

1 Can I apply DQN or policy gradient algorithms in the contextual bandit setting? 2020-06-06T17:48:11.417

1 Why is the "reward to go" replaced by Q instead of V, when transitioning from PG to actor critic methods? 2020-06-15T09:41:14.660

1 What is the proof that the variance of the gradient estimate in Actor-Critic is smaller than in REINFORCE? 2020-06-28T22:51:35.317

0 Patient PPO: how to handle imbalanced discrete action space? 2020-06-19T20:54:55.427

0 How to deal with GAE ineffectiveness because of critic value adaptation? 2020-06-21T05:24:02.143