4

I've been looking online for a while for a source that explains these computations but I can't find anywhere what does the $|A(s)|$ mean. I guess $A$ is the action set but I'm not sure about that notation:

$$\frac{\varepsilon}{|\mathcal{A}(s)|} \sum_{a} Q^{\pi}(s, a)+(1-\varepsilon) \max _{a} Q^{\pi}(s, a)$$

Here is the source of the formula.

I also want to clarify that I understand the idea behind the $\epsilon$-greedy approach and the motivation behind the on-policy methods. I just had a problem understanding this notation (and also some other minor things). The author there omitted some stuff, so I feel like there was a continuity jump, which is why I didn't get the notation, etc. I'd be more than glad if I can be pointed towards a better source where this is detailed.

2Where did you take that formula from? And what is it supposed to represent? The action selection mechanism? Normally, epsilon-greedy simply means that you choose with epsilon probability a random action instead of taking the greedily (i.e. best possible) selected action. – Daniel B. – 2020-07-14T20:19:52.480

Sorry for that. Here's the source of the formula: http://www.incompleteideas.net/book/first/ebook/node54.html.

It also shows up in the practical implementation of the epsilon-greedy algorithm (bottom of the page)

2From the pseudo code, it is pretty clear that $A(s)$ refers to the set of all possible actions, since in step

c)the algorithm iterates through all actions ($a$) (taken from that set). That it is about the actions becomes apparent from the use of $a$. – Daniel B. – 2020-07-14T20:34:53.303Yes I realized that I was talking more about the notation $|A(s)|$ but I get it now. Thanks. – Metrician – 2020-07-14T20:44:04.983