Tag: proofs

15 Why doesn't Q-learning converge when using function approximation? 2019-04-05T18:23:46.233

11 Proof that Artificial General Intelligence is possible 2017-07-03T16:55:46.917

11 Where can I find the proof of the universal approximation theorem? 2019-07-11T08:40:14.007

10 What are the implications of the "No Free Lunch" theorem for machine learning? 2019-09-27T13:52:15.007

9 Why is baseline conditional on state at some timestep unbiased? 2018-09-09T20:31:07.373

9 How do we prove the n-step return error reduction property? 2018-12-08T05:24:56.380

8 Why is A* optimal if the heuristic function is admissible? 2018-04-13T02:25:29.013

7 How can a neural network approximate all functions when the weights are not allowed to grow exponentially? 2018-08-05T16:12:56.600

6 Can two admissable heuristics not dominate each other? 2019-10-03T00:55:04.350

5 Is there a limit of minimum error for a particular training dataset in artificial Neural Network? 2018-10-28T09:37:14.010

5 Can deep learning be used to help mathematical research? 2020-05-21T15:40:18.533

5 Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction? 2020-07-23T17:32:14.873

4 How do I show that uniform-cost search is a special case of A*? 2018-11-26T14:59:20.600

4 How are the reward functions $R(s)$, $R(s, a)$ and $R(s, a, s')$ equivalent? 2019-02-07T15:38:58.217

4 What is the proof that the branch and bound algorithm always finds optimal path in a graph? 2019-05-19T19:52:06.153

4 How is G(z) related to x in GAN proof? 2019-06-15T23:16:55.513

4 How to show temporal difference methods converge to MLE? 2019-08-14T16:15:30.013

4 Why does estimation error increase with $|H|$ and decrease with $m$ in PAC learning? 2019-09-16T10:51:21.577

4 If an heuristic is not admissible, can it be consistent? 2019-11-09T11:18:09.290

4 Why are the Bellman operators contractions? 2020-07-31T02:48:34.320

3 Is there a mathematical proof that shows that certain parameters work "better" than others for a certain task? 2017-04-04T22:44:25.570

3 Why is the max a non-expansive operator? 2019-03-14T20:50:46.340

3 How do I find whether this heuristic is or not admissible and consistent? 2019-03-26T10:33:47.713

3 Is unsupervised disentanglement really impossible? 2019-08-12T00:38:56.740

3 Understanding proof of lemma 1 (policy improvement bound) of the "Trust Region Policy Optimization" paper 2019-11-21T22:38:18.797

3 Why is the stationary distribution independent of the initial state in the proof of the policy gradient theorem? 2019-12-03T10:50:41.380

3 Is the summation of consistent heuristic functions also consistent? 2020-02-28T15:25:48.877

3 What is the proof that policy evaluation converges to the optimal solution? 2020-04-16T06:44:00.997

3 Why does (not) the distribution of states depend on the policy parameters that induce it? 2020-08-27T10:36:32.770

2 Understanding why the expectation is over the new policy $\pi'$ in the proof of the Policy Improvement Theorem 2017-01-28T06:46:10.700

2 Why is exact inference in a Bayesian network both NP-hard and P-hard? 2018-07-05T10:36:42.557

2 Understanding the proof of theorem 2.1 from the paper "Efficient reductions for imitation learning" 2018-11-05T15:39:08.287

2 Understanding lemma 2 of the "Trust Region Policy Optimization" paper 2018-11-27T16:52:12.273

2 Why Nilsson's Sequence Score isn't an admissible heuristic? 2019-02-24T06:24:58.143

2 Is the minimum and maximum of a set of admissible and consistent heuristics also consistent and admissible? 2019-04-10T13:53:28.567

2 Understanding the proof that A* search is optimal 2019-06-27T07:16:51.563

2 How to show Monte Carlo methods converge to an estimate which minimizes mean squared error? 2019-08-14T16:53:35.113

2 What happens to the optimal value function if the reward is multiplied by a constant? 2019-09-15T20:58:34.960

2 Convert a PAC-learning algorithm into another one which requires no knowledge of the parameter 2020-01-16T04:57:48.913

2 How can we prove this inequality, related to the generalization error, without using the Rademacher complexity? 2020-01-16T11:01:20.123

2 A problem about the relation between 1-oracle and 2-oracle PAC model 2020-01-16T12:44:11.397

2 How can I show that the VC dimension of the set of all closed balls in $\mathbb{R}^n$ is at most $n+3$? 2020-01-16T13:25:46.480

2 Why does KL divergence not satisfy the triangle inequality? 2020-02-14T08:23:15.227

2 Is there a mathematical theory behind why MLP can classify handwritten digits? 2020-02-14T19:47:27.067

2 Is an oracle that answers only with a "yes" or "no" dangerous? 2020-03-06T14:53:13.503

2 How to prove $\mathcal H$ with VC dimension $d$ shatter all subsets with size less than $d-1$? 2020-03-28T21:12:24.790

2 Does Rice's theorem prove safe AI is undecidable? 2020-04-02T20:39:10.230

2 Is the derivative of the loss wrt a single scalar parameter proportional to the loss? 2020-04-03T12:01:55.227

2 Equivalence between expected parameter increments in "Off-Policy Temporal-Difference Learning with Function Approximation" 2020-04-07T10:36:06.187

2 Proof of Maximization Bias in Q-learning? 2020-06-15T18:56:54.630

2 Do we assume the policy to be deterministic when proving the optimality? 2020-08-18T09:32:53.853

1 Why does Q-learning converges to optimal policy even if I am acting suboptimally? 2018-11-17T22:06:42.443

1 Proof of Correctness of Monte Carlo Tree Search 2019-06-10T18:54:18.250

1 Is there a rigorous proof for finding Hopfield minima? 2019-07-22T17:05:02.367

1 Is there a simple proof of the convergence of TD(0)? 2020-02-22T22:59:51.977

1 Does TD(0) prediction require Robbins-Monro conditions to converge to the value function? 2020-02-24T18:00:33.417

1 What are the conditions for the convergence of SARSA to the optimal value function? 2020-02-27T12:53:48.450

1 Does SARSA(0) converge to the optimal policy in expectation if the Robbins-Monro conditions are removed? 2020-02-27T15:23:50.410

1 How to prove that gradient descent doesn't necessarily find the global optimum? 2020-03-16T02:36:18.900

1 Monte Carlo epsilon-greedy Policy Iteration: monotonic improvement for all cases or for the expected value? 2020-04-25T20:06:16.880

1 Why is probability that at least one hypothesis out of $k$ being consistent with $m$ training examples $k(1- \epsilon)^m$? 2020-04-29T03:06:50.460

1 Why is it hard to prove the convergence of the deep Q-learning algorithm? 2020-05-10T16:01:31.993

1 How do you prove that minimax algorithm outputs a subgame-perfect Nash equilibrium? 2020-05-28T14:56:25.563

1 What is the proof that "reward-to-go" reduces variance of policy gradient? 2020-06-10T13:38:53.023

1 When to use AND and when to use Implies in first-order logic? 2020-06-18T18:39:43.483

1 What is the proof that the variance of the gradient estimate in Actor-Critic is smaller than in REINFORCE? 2020-06-28T22:51:35.317

0 Can a computer make a proof by induction? 2020-08-09T03:14:05.647

0 Is there a difference in the convergence analysis/proof of the chaotic learning automaton compared to the standard LA? 2020-08-30T20:10:08.450

-3 Is there a proof that states that an AI can become smarter than its creator? 2016-12-29T23:51:07.190