5 Why is the state-action value function used more than the state value function? 2020-02-14T07:16:15.987

4 An example of a unique value function which is associated with multiple optimal policies 2018-08-20T09:01:13.013

4 What is the target Q-value in DQNs? 2020-04-19T03:25:51.150

4 Why is $G_{t+1}$ is replaced with $v_*(S_{t+1})$ in the Bellman optimality equation? 2020-06-04T19:27:43.360

4 Why are the value functions sometimes written with capital letters and other times with lower-case letters? 2020-06-10T02:46:22.760

3 What are the value functions used in reinforcement learning? 2019-02-14T17:03:57.140

3 How is the incremental update rule derived from the weighted importance sampling in off-policy Monte Carlo control? 2020-05-16T20:12:54.310

3 How do we express $q_\pi(s,a)$ as a function of $p(s',r|s,a)$ and $v_\pi(s)$? 2020-06-08T12:26:36.383

3 What is the value of a state when there is a certain probability that agent will die after each step? 2020-06-13T16:10:33.907

2 Why is the state value function sufficient to determine the policy if a model is available? 2019-02-15T09:24:46.600

2 Is there any grid world dataset or generator for reinforcement learning? 2019-03-07T02:18:47.097

2 How to stop DQN Q function from increasing during learning? 2019-04-24T14:15:02.803

2 Q Learning for FrozenLake environment not converging to V* values from Value Iteration 2019-10-14T16:52:27.350

2 Why is there an expectation sign in the Bellman equation? 2020-04-03T18:43:07.627

2 Why do I need an initial arbitrary policy to implement value iteration algorithm 2020-04-16T12:11:58.753

2 Do policy independent state and action values exist in reinforcement learning? 2020-04-20T17:18:35.407

2 Are these two definitions of the state-action value function equivalent? 2020-05-07T09:58:45.690

2 Equation not satisfied in Policy Iteration Algorithm 2020-06-06T07:34:06.170

2 How to express $v_\pi(s)$ in terms of $q_\pi(s,a)$? 2020-06-17T07:11:19.363

2 Why isn't it wise for us to completely erase our old Q value and replace it with the calculated Q value? 2020-06-26T22:07:29.993

2 Why is the expected return in Reinforcement Learning (RL) computed as a sum of cumulative rewards? 2020-07-17T01:17:09.320

2 What's an example of a simple policy but a complex value function? 2020-08-27T10:16:55.680

1 Value-based methods for stochastic policies 2019-10-24T09:30:49.650

1 What is the relationship between the Q and V functions? 2020-03-19T09:17:33.637

1 What is the relationship between the reward function and the value function? 2020-04-03T22:06:27.777

1 Is the Q value the same as the state-action pair value? 2020-04-15T14:04:28.983

1 Why does the policy $\pi$ affect the Q value? 2020-04-16T00:52:59.400

1 The are some fundamental learning theories for developing an AI that imitates human behavior 2020-08-01T01:48:10.950

1 Value Iteration failing to converge to optimal value function in Sutton-Barto's Gambler problem 2020-09-05T07:56:55.750

0 What is the purpose of the arrow $\leftarrow$ in this formula? 2020-02-16T17:40:08.207

0 Understanding V- and Q-functions 2020-02-17T21:32:23.717

0 What is the difference between the state transition of an MDP and an action-value? 2020-04-28T14:43:15.900

0 How do I know that the DQN has learnt an appropriate Q function? 2020-06-14T12:09:13.400