17 Formal proof of vanilla policy gradient convergence 2019-06-15T16:58:34.553

6 Reinforcement Learning: Policy Gradient derivation question 2020-02-17T15:00:23.910

5 Policy Gradients - gradient Log probabilities favor less likely actions? 2018-09-11T07:46:42.180

5 RL's policy gradient (REINFORCE) pipeline clarification 2018-09-19T17:18:30.447

4 How does action get selected in a Policy Gradient Method? 2018-08-20T22:19:53.003

4 Reinforcement learning: Discounting rewards in the REINFORCE algorithm 2018-09-13T12:27:58.977

3 Policy Gradients vs Value function, when implemented via DQN 2018-07-18T07:09:18.330

3 Policy-based RL method - how do continuous actions look like? 2018-08-26T00:20:50.400

3 Time horizon T in policy gradients (actor-critic) 2018-08-28T14:56:34.993

2 Policy gradient - and auto-differentiation (Pytorch/Tensorflow) 2018-11-17T17:02:24.337

2 Policy gradient/REINFORCE algorithm with RNN: why does this converge with SGM but not Adam? 2019-04-10T20:40:52.490

2 Policy Gradient methods not converging to useful mean values 2019-07-01T13:21:59.433

2 Guidelines to debug REINFORCE-type algorithms? 2019-08-05T05:51:23.173

2 reinforcement learning: Decompose a policy gradient 2019-12-10T23:07:24.533

2 Maximum Entropy Policy Gradient Derivation 2019-12-11T14:03:34.933

1 Deep RL: Proximal policy optimization gradient calculation 2018-07-28T20:44:34.453

1 inverted pendulum REINFORCE 2018-08-02T15:07:30.697

1 Deep RL: Visualizing/Analyzing the gradient 2018-10-10T20:40:34.933

1 Stability of value function approximation in policy gradients 2018-10-16T19:23:24.950

1 Does policy optimization learn policies to make better actions with higher probability? 2018-11-13T17:51:23.150

1 Problem when cherry picking actions - Proximal Policy Optimization 2019-02-21T12:26:55.800

1 multipying negated gradients by actions for the loss in actor nn of DDPG 2019-03-24T14:42:41.007

1 REINFORCE algorithm with discounted rewards – where does gamma^t in the update come from? 2019-04-08T10:20:24.457

1 Why can't Policy Gradient Algorithm be seen as an Actor-Critic Method? 2019-07-22T09:54:45.860

1 Learning curve goes down after converge? 2019-07-30T13:07:01.010

1 Policy gradient vs cost function 2019-09-16T14:16:42.833

1 Reducing the training time of an RL agent 2019-09-19T08:34:33.857

1 How do the policy gradient's cost function and gradients work? 2019-09-19T16:15:33.980

1 Policy Gradient custom loss function not working 2019-10-04T02:38:56.970

1 Agent always takes a same action in DQN - Reinforcement Learning 2019-10-04T15:02:21.897

1 How to improve tensorflow 2.0 code for policy gradient? 2019-12-23T21:45:41.700

1 Entropy applied to policy gradient prevent our agent from being stuck in the local minimum? 2020-04-23T15:44:57.397

1 Policy Gradient not "learning" 2020-05-12T20:24:52.507

1 How is this score function estimator derived? 2020-07-23T21:30:38.977

0 Policy Gradient Methods - ScoreFunction & Log(policy) 2018-09-05T23:43:39.033

0 Why is "next state" kept in RL experience replay? 2019-01-23T18:03:18.010

0 Using reinforce algorithm with per-action reward instead of per-trajectory reward 2019-07-13T05:19:34.593

0 Reward is converging but actions taken by trained agent are illogical in reinforcement learning 2019-10-03T11:47:18.593

0 Policy Gradient with continuous action space 2019-10-14T11:51:21.743

0 Policy Gradient with Baseline Reward Oscillation (MATLAB Reinforcement Learning Toolbox) 2020-03-18T17:15:04.993

0 Intuition behind policy gradient 2020-04-18T14:23:28.740

0 Pruning CNN filters & Reinforcement Learning 2020-04-25T21:09:44.077

0 How is a policy expressed? 2020-10-26T09:54:07.990

0 How to apply policy gradient to discrete combinatorial action space? 2020-12-18T15:17:57.263

0 RL PPO Algorithm: Understanding the Value Function Loss term in PPO by OpenAI 2021-01-19T07:02:42.817

0 Why is the GAE usually caculated looping backwards over the rewards? 2021-01-24T21:08:19.053