Problem with calculating error rate for KNN

1

I am trying to validate the accuracy of my KNN algorithm for the movie rating prediction.

I have $2$ vectors: $Y$ - with the real ratings, $Y'$ - with predicted ones.

When I calculate Standard Error of the Estimate (is it the one I need to calculate?) using following formula:

$$\sigma_{est} = \sqrt{\frac{\sum (Y-Y')^2}{N}}$$

I'm getting result of $\sim 1.03$. But I thought that it can't be $> 1$. If it is not, then what does this number say to me?

results = load('first_try.mat');
Y = results(:,1);
Y_predicted = results(:,2);

o = sqrt(sum((Y-Y_predicted).^2)/rows(Y))

Dmitrij Kultasev

Posted 2018-10-02T06:53:33.097

Reputation: 145

2Why did you think it can't be bigger than 1? – user2974951 – 2018-10-02T06:54:24.873

I thought that 1 means that 100% match – Dmitrij Kultasev – 2018-10-02T06:54:48.940

2That's not what a standard error (SE) is, the SE is the standard deviation (SD) of your estimate, which goes from 0 to infinity. – user2974951 – 2018-10-02T06:55:50.730

Then the question, is it a right way to measure my algorithm accuracy? – Dmitrij Kultasev – 2018-10-02T06:57:14.313

1Depends what you mean by accuracy, the mean is usually used as your accuracy measure, the SE is used to show how variable this measure is (for ex. building a 95 % CI). Whether 1.03 is big in your case depends entirely on your data. – user2974951 – 2018-10-02T07:10:55.760

2"I thought that 1 means that 100% match" if we had a perfect match (i.e. $Y=Y'$), then wouldn't the numerator of your expression equal $0$ and hence give you $\sigma_{est}=0$? – ignoring_gravity – 2018-10-02T08:30:58.513

Answers

1

K-NN is a measure of distance, thus the result of your equation will depend on the scale of your data. If the ratings are in a scale from 0 to 100. Then if you always predict very poorly you are evidently going to have values much larger than 1.

For example, for a very bad predictor

import numpy as np

Y = [100, 90, 100, 90]
Y_p = [10, 10, 10, 10]

np.sqrt(np.sum(np.subtract(Y, Y_p)**2)/len(Y))

85.14693182963201

JahKnows

Posted 2018-10-02T06:53:33.097

Reputation: 7 863