6

I have 5 different classes in which I want to classify some data points. I'm using RNN with Echo-state networks (Reservoir computing).

Normally, a straightforward method consists of computing the outputs y and deciding the class using the argmax[k]**(y)** with k is the dimension of the right class. Now I found another method that consists of computing the argmax[k]**(sum(y))**, and that is what I didn't understand.

I've attached an image of the mathematical approach. The latter is understandable, but it appears not logical for me, since the sum of y over the data points timescale gives the same result for each time sequence, meaning that all data points will be classified in the same class(which corresponds to the max of the sum(y)).

Can someone what I don't understand?