## evaluation metrics for multiple values per session

3

1

I have an application that executes my foo() function several times for each user session. There are 2 alternate algorithms that i can implement as "foo" function and my goal is to evaluate them based on execution delay .

The number of times foo() is called per user session is variable but will not exceed 10000. Say delays values are:

Algo1: [ [12, 30, 20, 40, 24, 280] , [13, 14, 15, 100], [20, 40] ]
Algo2: [ [1, 10, 5, 4, 150, 20] , [14, 10, 20], [21, 33, 41, 79] ]


My question is whats the best metric to pick the winner ?

possible options

1. average from each session, and then evaluate cdf
2. median from each session and then evaluate cdf
3. anything else ?

## Answers

0

Here is a suggestion:

Standardise everything (if you ommit this than some big number like 9999 can ruin everything), than take average value per user session. Than, optionally, mutliply this number by x/10 for example, where x is the sample size in the use session (think of it like evidence where more samples add more confidence) and finally average by number of sessions for the algorithm.

0

It is common to look at 90th or 99th percentile latency in computer systems.

A user won't notice the difference between a couple of milliseconds of lag but if a function occasionally takes several seconds that is very noticeable.