What does the notation sup dist mean in distributional RL?


I'm trying to understand distributional RL, based on this article. In one of the equations, there is a symbol $\operatorname{sup dist}$.

\begin{align} \operatorname{sup dist}_{s, a} (R(s, a) + \gamma Z(s', a^*), Z(s, a)) \\ s' \sim p(\cdot \mid s, a) \end{align}

What does $\operatorname{sup dist}$ mean?


Posted 2020-01-06T18:56:44.160

Reputation: 43



It doesn't seem that it is a "proper" symbol.

I guess that $\sup$ simply refers to the supremum, that is, you want to select actions that maximize the quantity that comes to the right of $\sup$, while $\text{dist}$ is simply a proxy for any possible distance between distributions. For example, you can replace $\text{dist}$ with the Kullback-Leibler divergence or with the mutual information.

Diego Gomez

Posted 2020-01-06T18:56:44.160

Reputation: 368