sklearn.accuracy_score(y_test, y_predict) vs np.mean(y_predict == y_test)


What is the difference between these two methods for finding model accuracy?

I have used both methods in python3 and i normally get identical results. However in few cases i get completely different results, so I am trying to figure out the possible reason for this.


Both of these methods act differently. You will only get the same results in very few cases or if you are testing only one row at a time.

  • np.mean(y_test==y_pred) first checks if all the values in y_test is equal to corresponding values in y_pred which either results in 0 or 1. And then takes the mean of it (which is still 0 or 1).

  • accuracy_score(y_test, y_pred) counts all the indexes where an element of y_test equals to an element of y_pred and then divide it with the total number of elements in the list.

For example-

  import numpy as np
  from sklearn.metrics import accuracy_score
  y_test = [2,2,3]
  y_pred = [2,2,1]
  print(accuracy_score( y_test, y_pred))

This code returns -


You will get the same result from both the method if you have only one sample/element to test. You can find more details here on accuracy_score and np.mean.

Also, accuracy_score is only for classification data. As mentioned in the first line here.

Keshav Garg

So am I correct in assuming that to gauge the accuracy of my RandomForest CLassifier I should use accuracy_score() rather than np.mean? – codiearcher – 2019-09-10T12:02:21.207

@codiearcher Yes but I would suggest you to also look at confusion_matrix to evaluate the overall results of your model rather than just accuracy.

– Keshav Garg – 2019-09-10T12:38:43.503

Thank you. Yes I am already using a confusion matrix, as well as checking prec, recall and F1. – codiearcher – 2019-09-10T12:42:35.880

You are assuming that y_test and y_pred are lists. If they are dataframe columns, then the two methods are identica. – Brady Gilg – 2019-09-15T18:11:36.313