MDP - RL, Multiple rewards for the same state possible?

1

This question is from An introduction to RL Pages 48 and 49. This question may also be related to below question, although I am not sure: Cannot see what the "notation abuse" is, mentioned by author of book

On page 48, it is mentioned that p:S * R * S * A -> [0,1] is a deterministic function:

The dynamics function $p : \mathcal{S} \times \mathcal{R} \times \mathcal{S} \times \mathcal{A} \rightarrow [0, 1]$ is an ordinary deterministic function of four arguments.

However, on page 49, in equation 3.4, there is summation over r:

$$\sum_{s' \in \mathcal{S}}\sum_{r \in \mathcal{R}} p(s',r|s,a) = 1 ,\text{for all } s \in \mathcal{S}, a \in \mathcal{A}(s)$$

My question is, does this mean, it is possible that performing an action $a$ that takes us to state $s'$, could result in multiple rewards?

Melanie A

Posted 2018-08-30T09:29:50.213

Reputation: 37

Answers

1

it is possible that performing an action $a$ that takes us to state $s′$, could result in multiple rewards?

Yes, that is true the general case that any $(s,a)$ pair can result in a range of results for both $s'$ and $r$. Also $s'$ and $r$ can vary independently, provided each ones distribution only depends on $(s,a)$. In practice $r$ often depends strongly on one or more of $s$, $a$ or $s'$ (if it depends on the latter, then it still depends in absolute terms on just $s$ and $a$ because $s'$ does - it is just that values of $s'$ and $r$ are allowed to correlate).

This does not affect the statement about $p(s',r|s,a)$ being deterministic. It is the probability of specific $s',r$ results occurring that should be deterministic and depend on $(s,a)$ in a Markov Decision Process.

Neil Slater

Posted 2018-08-30T09:29:50.213

Reputation: 24 613