Is there a simple proof of the convergence of TD(0)?

1

Does anybody know a simple proof of the convergence of the TD(0) value function prediction algorithm?

enter image description here

KaneM

Posted 2020-02-22T22:59:51.977

Reputation: 169

Answers

1

As far as I know, there is no very simple proof of the convergence of temporal-difference algorithms. The proofs of convergence of TD algorithms are often based on stochastic approximation theory (given that e.g. Q-learning can be viewed as a stochastic process) and the work by Robbins and Monro (in fact, the Robbins-Monro conditions are usually assumed in the theorems and proofs).

The proofs of convergence of Q-learning (a TD(0) algorithm) and SARSA (another TD(0) algorithm), when the value functions are represented in tabular form (as opposed to being approximated with e.g. a neural network), can be found in different research papers.

For example, the proof of convergence of tabular Q-learning can be found in the paper Convergence of Stochastic Iterative Dynamic Programming Algorithms (1994) by Tommi Jaakkola et al. The proof of convergence of tabular SARSA can be found in the paper Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms (2000) by Satinder Singh et al.

See also How to show temporal difference methods converge to MLE?.

nbro

Posted 2020-02-22T22:59:51.977

Reputation: 19 783