$\mathbb E$ is the symbol for the expectation (or expected value).

To fully understand the concept of *expected value*, you need to understand the concept of *random variable*. An example should help you understand the idea behind the concept of a random variable.

Suppose you toss a coin. The outcome of this (random) experiment can either be *heads* or *tails*. Formally, the *sample space*, $\Omega = \{\text{heads}, \text{tails}\}$, is the set that contains the possible outcomes of a random experiment. The outcome (e.g. *heads*) is the result of a random process. A random variable is a function that we can associate with a random process so that we can more formally describe the random process. In this case, we can associate a random variable, $T$, with this random process of tossing a coin.

$$
T(\omega) =
\begin{cases}
1, & \text{if } \omega = \text{heads}, \\[6pt]
0, & \text{if } \omega = \text{tails},
\end{cases}
$$

where $\omega \in \Omega$.

In other words, if the outcome of the random process is *heads*, then the output of the associated random variable $T$ is $1$, else it is $0$.

We can also associate with each random process (and thus with the corresponding random variable) a probability distribution, which, intuitively, describes the probability of occurrence of each possible outcome of the random process. In the case of the coin-flipping random variable (or process), assuming that the coin is "fair", then the following function describes the probability of each outcome of the coin

$$
f_T(t) =
\begin{cases}
\tfrac 12,& \text{if }t=1,\\[6pt]
\tfrac 12,& \text{if }t=0,
\end{cases}
$$

In other words, there is $\tfrac 12$ probability that the outcome of the random process is $1$ (heads) and $\tfrac 12$ probability that it is $0$ (tails).

If you throw a coin $n$ times in the air, how many times will it land heads and tails? Of course, it will depend on the experiment. In the first experiment, you might get $\frac{3n}{4}$ heads and $\frac{n}{4}$ tails. In the second experiment, you might get $\frac{n}{2}$ heads and $\frac{n}{2}$ tails, and so on. If you repeat this experiment an infinite amount of times (of course, we can't do that, but imagine if we could do that), how many times do you **expect** (on average) to get heads and tails? The expected value is the answer to this question.

In the case of the coin-tossing experiment, the outcomes are discrete (heads or tails), consequently, $T$ is a *discrete random variable*. In the case of a discrete random variable, the expected value **is defined as** follows

$$\mathbb E[T] = \sum_{t \in T} p(t) t$$

where $t$ is the outcome of the random variable $t$ and $p(t)$ is the probability of such outcome. In other words, the expected value of a random variable $T$ is defined as a weighted sum of the values it can take, where the weights are the corresponding probabilities of occurrence. So, in the case of the coin-tossing experiment, the expected value is

\begin{align}
\mathbb E[T]
&= \sum_{t \in T} p(t) t\\
&= \frac{1}{2}1 + \frac{1}{2} 0\\
&=\frac{1}{2}
\end{align}

What does $\mathbb E[T] = \frac{1}{2}$ mean? Intuitively, it means that half of the times the random process produces heads and half of the times it produces tails, assuming it is governed by the probability distribution $f_T(t)$.

Note that, if the probability distribution $f_T(t)$ had been defined differently, then the expected value would also have been different, given that the expected value is defined as a function of the probability of occurrence of each outcome of the random process.

In your specific examples, $\mathbb E$ is still the symbol for the expected value. For example, in the case of $Q^\pi \left(s,a \right) = \mathbb E \left[R_t|s_t = s, a_t = a, \pi \right]$, $Q^\pi \left(s,a \right)$ is thus defined as the expected value of the random variable $R_t$, given that $s_t = s$, $a_t = a$ and the policy is $\pi$ (so this is actually a conditional expectation). In this specific case, $R_t$ represents the *return* at time step $t$, which, in reinforcement learning, is defined as

$$
R_t = \sum_{k=0}^\infty \gamma^k r_{t+k+1}
$$

where $r_{t+k+1} \in \mathbb{R}$ is the reward at time step $t+k+1$. $R_t$ a random variable because it is assumed that the underlying environment is a random process.

It is not always easy to intuitively understand the expected value of a random variable. For example, in the case of a coin-flipping random process, the expected value $\frac{1}{2}$ should be intuitive (given that it is the average of $1$ and $0$), but, in the case of $Q^\pi \left(s,a \right)$, at first glance, it is not clear what the expected value should be (hence the need for algorithms such as Q-learning), given that it depends on the rewards, which depend on the dynamics of the environment. However, the intuition behind the concept of the expected value and the calculation (provided the associated random variable is discrete) does not change.

In the case there is more than one random variable involved in the calculation of the expected value, then we also need to specify the random variable the expected value is being calculated with respect to, hence the subscripts of the expected value in your examples. See, for example, Subscript notation in expectations for more info.

2

It's the expected value symbol

– Brale – 2019-08-27T15:02:18.443