3

Considering application of Reinforcement learning(dynamic programming method performing value iteration) on grid world, in each of the iteration, I go through each of the cell of the grid and update its value depending on its present value and the present value of the taking action from that state. Now

- How long do I keep updating value of each cell? Shall I keep updating unless the change in the previous and the present value function is the least? I am not able to understand how to implement the stopping mechanism in the grid-world scenario(discount not considered)
- Is the value function the values of all the grids in the grid world?

how do I set a threshold... What i am doing id the update the value of each grid with respect to the grids that the control can go to from the present grid.. What do you mean by saying V is a function – girl101 – 2015-08-06T04:00:21.690

$V(s)$ is a function that returns the utility of that state. In a computer program, where you have enumerated the states, you may well end up modelling $V$ as a simple array and treat it as an array lookup – Neil Slater – 2015-08-06T08:53:37.033

how do I set the threshold – girl101 – 2015-08-10T05:08:08.020

Make some test to what is best for you. Typically 0 is the optimal solution. That means that there is no better solution than this one. Since it's an hyperparam, you can learn it via a neural network. – Dref360 – 2015-08-10T20:55:57.640

@Dref360 i want to learn it via dynamic programming , I dont want to learn it via neural, – girl101 – 2015-08-11T03:53:12.397

@Dref360 what is hyperparam, i googled, i got the term hyperparameter, i that the short form of hyperparam ? – girl101 – 2015-08-11T03:54:34.170

@Dref360 can I stop learning when I notice no new updation in any of the states ?? – girl101 – 2015-08-11T04:19:40.790

@Rishika HyperParam == HyperParameter for exemple in neural network : number of layer, number of hidden neuron. Yes you can stop learning when there is not update in the state. That mean there is no better solution. – Dref360 – 2015-08-11T21:19:23.810

okay, got it :) – girl101 – 2015-08-12T03:45:20.677