When you want to compare Reinforcement Learning algorithms, you might want to compare the average rewards they generate and how fast and close they get to the optimal policy. However, in the case of comparing it to humans, you might want to compare the game results of all the games played.

# Reward Comparison

Often Reinforcement Learning algorithms are compared by using the rewards (either direct, maximum or average in time/iteration). For example, in this page about RL a comparison of two algorithms is shown:

Or, when you know the optimal actions, you can plot the number of plays/iterations against the percentage of actions. See for example this RL comparison on the 10-armed testbed problem:

Henderson et al. 2017 have a whole section about the evaluation metrics of Reinforcement Learning algorithms. They also comment on the plotting of the average or maximum cumulative rewards, moreover they mention the *sample bootstrap* method to create confidence intervals for a better comparison. Lastly, they mention that the *significance* of the improvements of the algorithms should be calculated, using a statistical test, such the two sample t-Test. Note that you should take into account the distributions of the datasets to choose the right statistical test. An interesting article related to this is A Hitchhiker's Guide to Statistical Comparison of Reinforcement Learning Algorithms of Colas et al..

# Comparison of plays against humans

To find out how well different algorithms play against humans you should do a large number of games and compare - what you consider - important parameters, for example: did the algorithm won, the time it took to win, number of points gained, etc. These values can then be compared statistically. Note that you have to think well about the setup of these experiments since you only should change the algorithms, the other parameters should stay equal. Therefore, you should - preferably - use a large number of subjects (of different ages, sexes, etc. to cover many types of people), and try to prevent any bias; think about the order, how many games are played, location, time etc.