Tag: rewards

12 What would motivate a machine? 2017-08-27T15:23:54.393

6 Counterexamples to the reward hypothesis 2019-11-19T02:30:52.573

6 Why cannot an AI agent adjust the reward function directly? 2019-12-14T08:07:26.960

5 Suitable reward function for trading buy and sell orders 2019-01-20T00:44:44.533

5 Why is the reward in reinforcement learning always a scalar? 2020-08-06T22:06:40.557

4 Why does the "reward to go" trick in policy gradient methods work? 2018-12-20T01:00:04.310

4 What research has been done on learning non-Markovian reward functions? 2019-04-07T17:45:38.380

4 How to assign rewards in a non-Markovian environment? 2019-10-31T16:15:47.053

4 How do I convert an MDP with the reward function in the form $R(s,a,s')$ to and an MDP with a reward function in the form $R(s,a)$? 2020-05-25T11:19:46.343

4 Is there any difference between reward and return in reinforcement learning? 2020-06-04T03:35:09.387

4 Upper limit to the maximum cumulative reward in a deep reinforcement learning problem 2020-07-18T13:27:17.247

4 How can we prevent AGI from doing drugs? 2020-08-10T05:26:52.783

3 What should I do when the potential value of a state is too high? 2018-05-08T22:23:53.637

3 Is my interpretation of the return correct? 2018-08-24T19:05:08.083

3 What is the main difference between additive rewards and discounted rewards? 2018-12-09T08:29:55.637

3 Given specific rewards, how can I calculate the returns for each time step? 2019-03-07T16:59:53.727

3 Reinforcement Learning with long term rewards and fixed states and actions 2019-03-20T21:53:38.110

3 How define a reward function for a humanoid agent whose goal is to stand up from the ground? 2019-05-17T17:05:22.153

3 Can I have different rewards for a single action based on which state it transitions to? 2019-08-30T12:48:56.257

3 Can someone please help me validate my MDP? 2019-09-05T15:08:52.330

3 What could be the cause of the drop in the reward in A3C? 2019-10-28T07:47:59.513

3 How does the optimization process in hindsight experience replay exactly work? 2020-03-12T10:19:55.543

3 Is this a good approach to solving Atari's "Montezuma's Revenge"? 2020-03-13T10:58:40.600

3 What are the guidelines for defining a reward function in reinforcement learning (bandit problem)? 2020-03-18T15:12:12.863

3 Do all expert trajectories have the same starting state in apprenticeship learning? 2020-03-27T09:08:16.220

3 How does normalization of the inputs work in the context of PPO? 2020-04-11T08:54:32.807

3 Formula for expected rewards for state–action–next-state triples as a three-argument function 2020-04-14T06:15:23.090

3 Appropriate algorithm for RL problem with sparse rewards, continuous actions and significant stochasticity 2020-04-23T09:39:41.123

3 Reward Function for Racing Game 2020-05-01T16:10:05.723

3 Shouldn't expected return be calculated for some faraway time in the future $t+n$ instead of current time $t$? 2020-05-03T06:11:06.840

3 What are some best practices when trying to design a reward function? 2020-08-03T16:30:18.907

3 How can I fix jerky movement in a continuous action space 2020-08-29T14:09:13.053

2 How to define reward function in POMDPs? 2018-10-29T16:09:10.950

2 How do I avoid an agent to tend to terminate in a negative state when time needs to be taken into account? 2019-02-21T17:00:37.000

2 Can Reinforcement Learning solve problems, where certain elements in the environement are randomly located? 2019-03-21T21:38:12.243

2 Do we have to consider the feasability of an action when defining the reward function of a MDP? 2019-03-22T20:42:21.047

2 How is GARB implemented in PGRD-DL to calculate gradients w.r.t. internal rewards? 2019-05-05T20:25:17.603

2 Using heuristic dense rewards in a sparse problem 2019-07-18T04:12:21.953

2 Will the RL agent implemented as a neural network fine-tune itself? 2019-07-19T17:59:27.777

2 Is it possible to use Reward Function of type R(s, a, s') if more than one action is applied? 2019-08-07T13:25:17.847

2 Should RL rewards diminish over time? 2019-08-11T07:35:33.300

2 Developmental systems that try to explain or understand the reward value in the reinforcement learning? 2019-08-20T21:02:56.617

2 Doubt in Deep-Q learning with sparse rewards 2019-10-15T16:04:31.937

2 Simulating successful trajectories in Montezuma's Revenge turns out to be unsuccessful 2020-01-27T06:04:22.033

2 Reinforcement Learning Continuous Control (DDPG): How to avoid thrashing of issued actions? How to reward smooth output over flittering? 2020-01-31T10:45:09.033

2 Immediate reward received in Atari game using DQN 2020-02-11T16:20:54.547

2 Can recovering a reward function using IRL lead to better policies compared to reward shaping? 2020-02-13T06:04:06.380

2 Is there a good ratio between the positive and negative rewards in reinforcement learning? 2020-02-16T21:33:55.247

2 How should I define the reward function for the Connect Four game? 2020-04-03T20:52:18.113

2 How was the DQN trained to play many games? 2020-04-04T13:54:05.900

2 How should I design a reward function for a NLP problem where two models interoperate? 2020-04-16T12:29:23.950

2 Can the agent wait until the end of the episode to determine the reward in SARSA? 2020-06-01T18:49:52.903

2 Non-differentiable reward function to update a neural network 2020-06-09T19:42:50.047

2 Why does shifting all the rewards have a different impact on the performance of the agent? 2020-07-01T01:57:16.453

2 How is the reward in reinforcement learning different from the label in supervised learning problems? 2020-07-07T15:10:02.380

2 Why is the expected return in Reinforcement Learning (RL) computed as a sum of cumulative rewards? 2020-07-17T01:17:09.320

2 Is reinforcement learning reward set for step by step, or the whole sequence until failure? 2020-07-21T09:59:44.620

2 Should I use the discounted average reward as objective in a finite-horizon problem? 2020-08-10T06:06:56.967

2 What are the pros and cons of sparse and dense rewards in reinforcement learning? 2020-08-13T07:05:23.650

2 How do I design the rewards and penalties for an agent whose goal it is to explore a map 2020-08-28T17:00:40.643

1 Why is it ok to calculate the reward based on a hidden state? 2018-11-22T08:36:32.940

1 Deciding the rewards for different actions in Pong for a DQN agent 2019-04-13T16:57:34.497

1 What is the reward system of reinforcement learning? 2019-05-10T23:43:24.807

1 Reward problem in A2C with multiple simultaneous discrete actions 2019-07-21T07:14:35.417

1 Finding optimal Value function and Policy for an MDP 2019-09-04T10:52:49.967

1 Neural network for reinforcement learning 2019-10-31T09:08:33.547

1 Should an RL agent directly observe the reward? 2019-12-09T16:09:29.980

1 Does apprenticeship learning require prospective data? 2020-02-19T11:32:42.117

1 Which model should I choose to maximise reward of having chosen two numbers from a list? 2020-03-06T01:57:49.093

1 How can I find the appropriate reward value for my reinforcement learning problem? 2020-04-03T16:19:22.720

1 What is the relationship between the reward function and the value function? 2020-04-03T22:06:27.777

1 In RL, if I assign the rewards for better positional play, the algorithm is learning nothing? 2020-04-04T13:21:53.347

1 Can optimizing for immediate reward result in a policy maximizing the return? 2020-04-21T14:41:19.030

1 Which reward function works for recommendation systems using knowledge graphs? 2020-05-02T15:09:08.137

1 How do you know if an agent has learnt its environment in reinforcement learning? 2020-05-04T08:34:50.007

1 State-of-the-art algorithms not working on a custom RL environment 2020-05-05T20:12:39.927

1 When discounted MAB is useful? 2020-06-04T21:16:25.123

1 How do I calculate the return given the discount factor and a sequence of rewards? 2020-06-29T14:15:25.977

1 Can rewards be decomposed into components? 2020-07-09T16:49:17.693

0 For some reasons, a reward becomes a penalty if 2019-01-18T15:00:39.957

0 Encourage Deep Q to seek short-term reward 2019-04-21T18:02:13.173

0 Curiosity Driven Learning affect optimal policy 2020-02-12T05:34:41.743

0 How to incentivise snake to go straight to apple? 2020-03-09T10:44:02.503

0 What is the best measurement for how good an action of a reinforcement learning agent really is? 2020-04-15T16:20:36.693

0 How to combine two differently equally important signals into the reward function, that have different scales? 2020-08-07T11:48:21.100