When to stop calculating values of each cell in the grid in Reinforcement Learning(dynamic programming) applied on gridworld

3

Considering application of Reinforcement learning(dynamic programming method performing value iteration) on grid world, in each of the iteration, I go through each of the cell of the grid and update its value depending on its present value and the present value of the taking action from that state. Now

  1. How long do I keep updating value of each cell? Shall I keep updating unless the change in the previous and the present value function is the least? I am not able to understand how to implement the stopping mechanism in the grid-world scenario(discount not considered)

  2. Is the value function the values of all the grids in the grid world?

girl101

Posted 2015-08-05T10:27:51.370

Reputation: 1 093

Answers

3

1- You should set a threshold (a hyper-param) that will allow you to quit the loop.

Let V the values for all state s and V' the new values after value iteration.

if $\sum_s|V(s) - V’(s)| \le threshold$, quit

2 - V is a function for every cell in the grid yes because you need to update every cell.

Hope it helps.

Dref360

Posted 2015-08-05T10:27:51.370

Reputation: 161

how do I set a threshold... What i am doing id the update the value of each grid with respect to the grids that the control can go to from the present grid.. What do you mean by saying V is a function – girl101 – 2015-08-06T04:00:21.690

$V(s)$ is a function that returns the utility of that state. In a computer program, where you have enumerated the states, you may well end up modelling $V$ as a simple array and treat it as an array lookup – Neil Slater – 2015-08-06T08:53:37.033

how do I set the threshold – girl101 – 2015-08-10T05:08:08.020

Make some test to what is best for you. Typically 0 is the optimal solution. That means that there is no better solution than this one. Since it's an hyperparam, you can learn it via a neural network. – Dref360 – 2015-08-10T20:55:57.640

@Dref360 i want to learn it via dynamic programming , I dont want to learn it via neural, – girl101 – 2015-08-11T03:53:12.397

@Dref360 what is hyperparam, i googled, i got the term hyperparameter, i that the short form of hyperparam ? – girl101 – 2015-08-11T03:54:34.170

@Dref360 can I stop learning when I notice no new updation in any of the states ?? – girl101 – 2015-08-11T04:19:40.790

@Rishika HyperParam == HyperParameter for exemple in neural network : number of layer, number of hidden neuron. Yes you can stop learning when there is not update in the state. That mean there is no better solution. – Dref360 – 2015-08-11T21:19:23.810

okay, got it :) – girl101 – 2015-08-12T03:45:20.677