KNN scoring low compared to Logistic regression in MNIST challenge

0

KNN gives me a score of 0.76100 while it shows 94% accuracy for my training data (splitted with test_size =0.3) in my jupyter notebook while logistic regression gives me a score of 0.91485 with an accuracy of 92 %.I do not understand the reason. Please can anybody help me or suggest anything.Here is my model:

   from sklearn.neighbors import KNeighborsClassifier
   knn=KNeighborsClassifier(n_neighbors=1)
   knn.fit(X_train,y_train)
   pred = knn.predict(X_test)

below is the classification report :

                    precision    recall  f1-score   support

               0       0.94      0.99      0.96      1213
               1       0.95      0.99      0.97      1422
               2       0.95      0.92      0.93      1258
               3       0.92      0.94      0.93      1284
               4       0.93      0.94      0.94      1209
               5       0.93      0.91      0.92      1121
               6       0.96      0.97      0.97      1242
               7       0.93      0.93      0.93      1315
               8       0.97      0.89      0.93      1227
               9       0.91      0.91      0.91      1309

        accuracy                           0.94     12600
       macro avg       0.94      0.94      0.94     12600
    weighted avg       0.94      0.94      0.94     12600

Confusion matrix :

[[1190    2    2    4    1    2   10    0    0    2]
 [   2 1404    7    3    2    1    2    0    1    0]
 [  14   11 1157   25    8    0   10   15   11    7]
 [   0    1   19 1191    1   34    1   18   12    7]
 [   2    8    6    0 1125    2    4   11    3   48]
 [   2    2    3   36    5 1030   16    2   15   10]
 [  18    4    4    1    4    9 1200    0    2    0]
 [   1    8    7    6   10    1    0 1224    3   55]
 [   8   14    8   30   13   31    9    6 1093   15]
 [   4    2    4    6   38    6    0   51    8 1190]]

Thanks in advance

Isha

Posted 2019-10-13T11:18:37.387

Reputation: 1

post your model – Peter – 2019-10-13T11:27:56.413

@Peter I have posted the model .Please let me know if you need any other info. – Isha – 2019-10-13T14:42:44.047

Answers

1

If you are referring score as test accuracy, your knn is overfitting a lot because of 1 nearest neighbour. Please do it with high k values and also try to do cross validation to get best model.

Uday

Posted 2019-10-13T11:18:37.387

Reputation: 411