2

3

What is the relationship between Markov Decision Processes and Reinforcement Learning?

Could we say RL and DP are two types of MDP?

2

3

What is the relationship between Markov Decision Processes and Reinforcement Learning?

Could we say RL and DP are two types of MDP?

4

What is the relationship between Markov Decision Processes and Reinforcement Learning?

In Reinforcement Learning (RL), the problem to resolve is described as a Markov Decision Process (MDP). Theoretical results in RL rely on the MDP description being a correct match to the problem. If your problem is well described as a MDP, then RL may be a good framework to use to find solutions. That does not mean you need to fully describe the MDP (all the transition probabilities), just that you expect an MDP model could be made or discovered.

Conversely, if you cannot map your problem onto a MDP, then the theory behind RL makes no guarantees of any useful result.

One key factor that affects how well RL will work is that the states should have the Markov property - that the value of the current state is enough knowledge to fix immediate transition probabilities and immediate rewards following an action choice. Again you don't need to know in advance what those are, just that this relationship is expected to be reliable and stable. If it is not reliable, you may have a POMDP. If it is not stable, you may have a non-stationary problem. In either case, if the difference from a more strictly defined MDP is small enough, you may still get away with using RL techniques or need to adapt them slightly.

Could we say RL and DP are two types of MDP?

I'm assuming by "DP" you mean Dynamic Programming, with two variants seen in Reinforcement Learning: Policy Iteration and Value Iteration.

In which case, the answer to your question is "No". I would say the following relationships are correct:

DP is one type of RL. More specifically, it is a

*value-based*,*model-based*,*bootstrapping*and*off-policy*algorithm. All of those traits can vary.- Probably the "opposite" of DP is REINFORCE which is
*policy-gradient*,*model-free*, does not bootstrap, and is*on-policy*. Both DP and REINFORCE methods are considered to be Reinforcement Learning methods.

- Probably the "opposite" of DP is REINFORCE which is
DP requires that you fully describe the MDP, with known transition probabilities and reward distributions, which the DP algorithm uses. That's what makes it model-based.

The general relationship between RL and MDP is that RL is a framework for solving problems that can be expressed as MDPs.

Thanks for your brief explanation! Just you mean by "All of those traits can vary." that of RL all of this property of DP can be varied? Can I say: RL=DP+PG ? – user10296606 – 2018-09-27T20:47:54.270

@user10296606: I mean that you can build different kinds of RL algorithms where traits like "on-line" vs "off-line" is a choice. Each algorithm has a name, and RL is definitely more than DP+PG. For instance SARSA is *value-based*, *model-free*, *bootstrapping* and *on-policy*. There are more than those 4 dimensions possible too, and some dimensions are not just binary choice, but have middle ground. E.g. SARSA($\lambda$) can compromise between bootstrap and non-bootstrap methods by varying the parameter $\lambda$. In short, RL has a lot of variation built around the idea of solving MDPs – Neil Slater – 2018-09-27T20:58:07.153

Thanks, can we say that DP needs the whole model of environment as MDP? – user10296606 – 2018-09-28T05:46:25.803

Thanks, what we can say about being a planning method for RL and DP? – user10296606 – 2018-09-28T06:08:25.097

@user10296606: I am not sure what you mean. Perhaps ask a separate question, with a bit more detail than "what can we say". – Neil Slater – 2018-09-28T08:27:16.820

What is "DP" in your question? Dynamic Programming? – Neil Slater – 2018-09-27T06:17:50.227

Yes, Dynamic Programming – user10296606 – 2018-09-27T14:30:32.043