1

From Sutton and Barto, Reinforcement Learning: An Introduction (second edition draft), in equation 3.4 of page 38.

The probabilities given by the four-argument function p completely characterize the dynamics of a finite MDP. From it, one can compute anything else one might want to know about the environment, such as the state-transition probabilities (which we denote, with a slight abuse of notation, as a threeargument function

$p(s^{'} | s, a) \dot{=}Pr\{S_t=s^{'} | S_{t-1} = s, A_{t-1}=a\} = \sum_{r\in{R}}{p(s^{'},r|s,a)}$

The author mentioned, *with a slight abuse of notation*.
where is the abuse in the notation please? I didn't see anything that is not proper.

Thank you.

@Neil Slater, saw the modification, and thank you. btw, do you have any thoughts on why the author said like that? – cinqS – 2017-12-04T10:11:48.640

Sorry I don't understand formal notation well enough to spot anything odd. Originally I thought this was about the combined iterator in the sum $\sum_{r,s'}$ which is a little unusual . . . but it's not that in the quoted section. – Neil Slater – 2017-12-04T10:34:06.647