For this, we will need game theory.

In game theory, an optimal strategy is one that cannot be exploited by the opponent *even if they know your strategy*.

Let's say you want a strategy where your move selection is not based on what happened before (so you are not trying to model your opponent, or trick them into believing you will always play scissors and then throw them off, anything like that).
A strategy will look like $(P, S, R)$, where $P, S, R \in [0, 1], P+S+R = 1$. You select paper with probability $P$, scissors with probability $S$, rock with probability $R$.
Now, if your probabilities are a bit uneven (for example $(0.5, 0.2, 0.3)$) an opponent can abuse that strategy. If your opponent plays with probabilities $(p, s, r)$, their expected reward (counting +1 for win, -1 for loss, 0 for draw) would be $0.5(s - r) + 0.2(r - p) + 0.3(p - s) = 0.1p + 0.2s - 0.3r$. If they wish to maximize their wins, they would play scissors all the time against you, and expect to have a distinct advantage over you.

In general, for a strategy $(P, S, R)$ for you and $(p, s, r)$ for your opponent, your opponent's winnings would be $P(s - r) + S(r - p) + R(p - s) = p(R-S) + s(P-R) + r(S - P)$. If all the partial derivatives of this, with respect to $p$, $s$ and $r$ are 0, the opponent has no way to maximize his winnings; they would have no incentive to play a particular move over any other move.
This occurs when $P = S = R = \frac13$.

That's basically how to approach game theory: find a strategy so your opponent has no incentive to choose one action over another. The approach seems a bit counter-intuitive at first (you're trying to find the optimal strategy for your opponent instead of for yourself) but it works for many similar problems.

A deterministic policy can be the optimal policy. It entirely depends whether your environment is actively learning to adapt or not. I believe The formal argument will require us to find Nash equilibrium for each player, but making such an equilibrium requires some prior assumption about information (if I recall correctly) each player's possess about others, so I'll not write an answer. But the general direction is this, you can find about it more. – DuttaA – 2020-08-27T11:12:31.103

Interesting - if it's two RL agents playing against each other, the deterministic policy is definitely not optimal right? RL agents can adapt actively! – cogito_ai – 2020-08-27T13:48:11.700

True.....but we don't know.... RL policy can be deterministic if you want..no steadfast rule. – DuttaA – 2020-08-27T13:56:29.673