In general the different reward functions $R(s)$, $R(s, a)$ and $R(s, a, s')$ are *not* equivalent mathematically, so you will not find any formal proof.

It is possible for the functions to resolve to the same value in a specific MDP, if for instance you use $R(s, a, s')$ and the value returned only depends on $s$, then $R(s, a, s') = R(s)$. This is not true in general, but as the reward functions are often under your control, it can be the case quite often.

For instance, in scenarios where the agent's goal is to reach some pre-defined state, as in the grid world example from the video, then there is no difference between $R(s, a, s')$ or $R(s)$. Given that is the case, for those example problems you may as well use $R(s)$, as it simplifies the expressions that you need to calculate for algorithms like Q-learning.

I think the lecturer did not mean "equivalent" in the mathematical sense, but in the sense that future lectures will use one of the functions, and a lot of what you will learn is going to be much the same as if you had used a different reward function.

Finally, when should we use one representation over the other and why are there three representations?

Typically, I don't use any of those representations by default. I tend to use Sutton & Barto's $p(s', r|s, a)$ notation for combined state transitions and rewards. That expression returns probability of transitioning to state $s'$ and receiving reward $r$ when starting in state $s$ and taking action $a$. For discrete actions, you can re-write the expectation of the different functions $R$ in terms of this function as follows:

$$\mathbb{E}[R(s)] = \sum_{a \in \mathcal{A}(s)}\sum_{s' \in \mathcal{S}}\sum_{r \in {R}}rp(s', r|s, a)\qquad*$$

$$\mathbb{E}[R(s,a)] = \sum_{s' \in \mathcal{S}}\sum_{r \in {R}}rp(s', r|s, a)$$

$$\mathbb{E}[R(s,a, s')] = \sum_{r \in {R}}rp(s', r|s, a)$$

I think this is one way to see how the functions in the video are closely related.

Which one would you use? Depends what you are doing. If you want to simplify an equation or code, then use the simplest version of the reward function that fits with the reward scheme you set up for the goals of the problem. For instance, if there is one goal state to exit a maze, and an episode ends as soon as this happens, then you don't care hw you got to that state or what the previous state was, and can use $R(s)$

In practice what happens if you use a different reward function, is that you need to pay attention to where it appears in things like the Bellman equation for theoretical treatments. When you get to implement model-free methods like Q-learning, $R(s)$ or its variants don't really appear except in the theory.

* This is not technically correct in all cases. The assumption I have made is that $R(s)$ is a reward granted at the point of *leaving* state $s$, and is independent of how the state is left and where the agent ends up next.

If this was a fixed reward for *entering* state $s$, regardless of how, then it could be written around $R(s')$ as follows:

$$\mathbb{E}[R(s')] = \sum_{s \in \mathcal{S}}\sum_{a \in \mathcal{A}(s)}\sum_{r \in {R}}rp(s', r|s, a)$$

i.e. by summing all the rewards that end up at $s'$

1I would like to note that $R(s)$, $R(s, a)$ and $R(s, a, s')$ can be intuitively interpreted differently. We can think of $R(s)$ as the reward you obtain for entering, exiting or saying in state $s$ (no matter what action we take or which next state we may end up in). $R(s, a)$, intuitively, can be thought of as the reward you obtain for taking action $a$ in state $s$. Finally, $R(s, a, s')$ can be thought of as the reward you obtain for taking action $a$ in state $s$ and ending up in state $s'$. – nbro – 2019-02-09T14:29:24.320

I recall that the environment may be stochastic, so, in general, if we take action $a$ in state $s$, we may not end up always in the same next state, that is, if we take action $a$ in the state $s$ twice, the resulting next state may be different. – nbro – 2019-02-09T14:30:38.630