## Cannot see what the "notation abuse" is, mentioned by author of book

1

From Sutton and Barto, Reinforcement Learning: An Introduction (second edition draft), in equation 3.4 of page 38.

The probabilities given by the four-argument function p completely characterize the dynamics of a finite MDP. From it, one can compute anything else one might want to know about the environment, such as the state-transition probabilities (which we denote, with a slight abuse of notation, as a threeargument function

$p(s^{'} | s, a) \dot{=}Pr\{S_t=s^{'} | S_{t-1} = s, A_{t-1}=a\} = \sum_{r\in{R}}{p(s^{'},r|s,a)}$

The author mentioned, with a slight abuse of notation. where is the abuse in the notation please? I didn't see anything that is not proper.

Thank you.

@Neil Slater, saw the modification, and thank you. btw, do you have any thoughts on why the author said like that? – cinqS – 2017-12-04T10:11:48.640

Sorry I don't understand formal notation well enough to spot anything odd. Originally I thought this was about the combined iterator in the sum $\sum_{r,s'}$ which is a little unusual . . . but it's not that in the quoted section. – Neil Slater – 2017-12-04T10:34:06.647

The mathematical expression is completely legit. The abuse is in the fact that the function $p$, which is defined first time in equation 3.2, which:
The function $p: S$ x $R$ x $S$ x $A \rightarrow [0,1]$. is an ordinary deterministic function of four arguments...
is re-defined slightly differently just two lines after this definition (equation 3.4), as a three-argument function $p: S$ x $S$ x $A \rightarrow [0,1]$.
If they used $p$ to represent the regular probability measure, there would be no abuse. In the authors' notation, $p$ is a deterministic function, while the regular probability function is denoted as $Pr$; and keeping the same name for slightly different functions, is where the "innocent" notation abuse comes from.