## Tag: proofs

15 Why doesn't Q-learning converge when using function approximation? 2019-04-05T18:23:46.233

11 Proof that Artificial General Intelligence is possible 2017-07-03T16:55:46.917

11 Where can I find the proof of the universal approximation theorem? 2019-07-11T08:40:14.007

10 What are the implications of the "No Free Lunch" theorem for machine learning? 2019-09-27T13:52:15.007

9 Why is baseline conditional on state at some timestep unbiased? 2018-09-09T20:31:07.373

9 How do we prove the n-step return error reduction property? 2018-12-08T05:24:56.380

8 Why is A* optimal if the heuristic function is admissible? 2018-04-13T02:25:29.013

6 Can two admissable heuristics not dominate each other? 2019-10-03T00:55:04.350

5 Can deep learning be used to help mathematical research? 2020-05-21T15:40:18.533

4 How do I show that uniform-cost search is a special case of A*? 2018-11-26T14:59:20.600

4 How are the reward functions $R(s)$, $R(s, a)$ and $R(s, a, s')$ equivalent? 2019-02-07T15:38:58.217

4 What is the proof that the branch and bound algorithm always finds optimal path in a graph? 2019-05-19T19:52:06.153

4 How is G(z) related to x in GAN proof? 2019-06-15T23:16:55.513

4 How to show temporal difference methods converge to MLE? 2019-08-14T16:15:30.013

4 Why does estimation error increase with $|H|$ and decrease with $m$ in PAC learning? 2019-09-16T10:51:21.577

4 If an heuristic is not admissible, can it be consistent? 2019-11-09T11:18:09.290

4 Why are the Bellman operators contractions? 2020-07-31T02:48:34.320

3 Why is the max a non-expansive operator? 2019-03-14T20:50:46.340

3 How do I find whether this heuristic is or not admissible and consistent? 2019-03-26T10:33:47.713

3 Is unsupervised disentanglement really impossible? 2019-08-12T00:38:56.740

3 Is the summation of consistent heuristic functions also consistent? 2020-02-28T15:25:48.877

3 What is the proof that policy evaluation converges to the optimal solution? 2020-04-16T06:44:00.997

3 Why does (not) the distribution of states depend on the policy parameters that induce it? 2020-08-27T10:36:32.770

2 Why is exact inference in a Bayesian network both NP-hard and P-hard? 2018-07-05T10:36:42.557

2 Understanding lemma 2 of the "Trust Region Policy Optimization" paper 2018-11-27T16:52:12.273

2 Why Nilsson's Sequence Score isn't an admissible heuristic? 2019-02-24T06:24:58.143

2 Understanding the proof that A* search is optimal 2019-06-27T07:16:51.563

2 How to show Monte Carlo methods converge to an estimate which minimizes mean squared error? 2019-08-14T16:53:35.113

2 What happens to the optimal value function if the reward is multiplied by a constant? 2019-09-15T20:58:34.960

2 Convert a PAC-learning algorithm into another one which requires no knowledge of the parameter 2020-01-16T04:57:48.913

2 A problem about the relation between 1-oracle and 2-oracle PAC model 2020-01-16T12:44:11.397

2 Why does KL divergence not satisfy the triangle inequality? 2020-02-14T08:23:15.227

2 Is there a mathematical theory behind why MLP can classify handwritten digits? 2020-02-14T19:47:27.067

2 Is an oracle that answers only with a "yes" or "no" dangerous? 2020-03-06T14:53:13.503

2 How to prove $\mathcal H$ with VC dimension $d$ shatter all subsets with size less than $d-1$? 2020-03-28T21:12:24.790

2 Does Rice's theorem prove safe AI is undecidable? 2020-04-02T20:39:10.230

2 Is the derivative of the loss wrt a single scalar parameter proportional to the loss? 2020-04-03T12:01:55.227

2 Proof of Maximization Bias in Q-learning? 2020-06-15T18:56:54.630

2 Do we assume the policy to be deterministic when proving the optimality? 2020-08-18T09:32:53.853

1 Why does Q-learning converges to optimal policy even if I am acting suboptimally? 2018-11-17T22:06:42.443

1 Proof of Correctness of Monte Carlo Tree Search 2019-06-10T18:54:18.250

1 Is there a rigorous proof for finding Hopfield minima? 2019-07-22T17:05:02.367

1 Is there a simple proof of the convergence of TD(0)? 2020-02-22T22:59:51.977

1 Does TD(0) prediction require Robbins-Monro conditions to converge to the value function? 2020-02-24T18:00:33.417

1 What are the conditions for the convergence of SARSA to the optimal value function? 2020-02-27T12:53:48.450

1 How to prove that gradient descent doesn't necessarily find the global optimum? 2020-03-16T02:36:18.900

1 Why is it hard to prove the convergence of the deep Q-learning algorithm? 2020-05-10T16:01:31.993

1 How do you prove that minimax algorithm outputs a subgame-perfect Nash equilibrium? 2020-05-28T14:56:25.563

1 What is the proof that "reward-to-go" reduces variance of policy gradient? 2020-06-10T13:38:53.023

1 When to use AND and when to use Implies in first-order logic? 2020-06-18T18:39:43.483

0 Can a computer make a proof by induction? 2020-08-09T03:14:05.647

-3 Is there a proof that states that an AI can become smarter than its creator? 2016-12-29T23:51:07.190