Tag: actor-critic

3 How to design two different neural nets for actor and critic RL? 2017-12-05T11:39:15.193

3 Time horizon T in policy gradients (actor-critic) 2018-08-28T14:56:34.993

3 Evaluating a trained Reinforcement Learning Agent? 2019-10-30T11:41:00.863

2 A3C - Turning action probabilities into intensities 2018-01-27T12:49:50.810

2 Actions taken by agentn/ agent performance not improving 2020-01-21T05:41:45.903

1 Stability of value function approximation in policy gradients 2018-10-16T19:23:24.950

1 Reinforcement learning - generating a matrix of continuous values with varying size for test data generation 2019-03-13T01:55:37.100

1 multipying negated gradients by actions for the loss in actor nn of DDPG 2019-03-24T14:42:41.007

1 A2C Continuous for Pendulum-v0 working implementation, negation for loss and entropy calculation 2019-06-23T01:25:13.443

1 Why can't Policy Gradient Algorithm be seen as an Actor-Critic Method? 2019-07-22T09:54:45.860

1 Agent always takes a same action in DQN - Reinforcement Learning 2019-10-04T15:02:21.897

1 Actor Critic Model implementation 2019-11-25T00:42:20.017

1 Formulation of a reward structure 2019-11-26T10:26:13.633

1 Having a reward structure which gives high positive rewards compared to the negative rewards 2019-11-27T04:39:14.807

1 Rewards are converged but with a lot of variations 2019-11-29T10:28:43.653

1 Action selection in actor-critic algorithm: 2020-03-30T12:57:39.447

1 Pytorch XLA to solve the spawn problems in a Colab Env 2020-05-22T12:53:12.133

1 A2C learning very slowly when I try to make it learn on batches as compared to making it learn on each step 2020-06-14T12:58:46.570

0 Proof subtracting baseline doesn't influence gradient can be used to show no gradient exist at all? 2019-02-05T01:21:37.723

0 Reward is converging but actions taken by trained agent are illogical in reinforcement learning 2019-10-03T11:47:18.593

0 Different results every time I train a reinforcement learning agent 2019-11-06T11:03:14.093

0 How to handle differences between training and deploying of an RL agent 2019-11-18T07:16:13.767

0 RL PPO Algorithm: Understanding the Value Function Loss term in PPO by OpenAI 2021-01-19T07:02:42.817