Considering application of Reinforcement learning(dynamic programming method performing value iteration) on grid world, in each of the iteration, I go through each of the cell of the grid and update its value depending on its present value and the present value of the taking action from that state. Now
- How long do I keep updating value of each cell? Shall I keep updating unless the change in the previous and the present value function is the least? I am not able to understand how to implement the stopping mechanism in the grid-world scenario(discount not considered)
- Is the value function the values of all the grids in the grid world?