## How does Keras calculate accuracy?

34

11

How does Keras calculate accuracy from the classwise probabilities? Say, for example we have 100 samples in the test set which can belong to one of two classes. We also have a list of the classwise probabilites. What threshold does Keras use to assign a sample to either of the two classes?

are you using model.evaluate in keras? – Hima Varsha – 2016-10-07T08:15:22.017

Yes, I am using model.evaluate. More specifically, model.evaluate_generator. – Raghuram – 2016-10-07T10:10:26.947

1

Possibly related @SO: How does Keras evaluate the accuracy?)

– desertnaut – 2018-07-03T14:52:37.020

33

For binary classification, the code for accuracy metric is:

K.mean(K.equal(y_true, K.round(y_pred)))


which suggests that 0.5 is the threshold to distinguish between classes. y_true should of course be 1-hots in this case.

It's a bit different for categorical classification:

K.mean(K.equal(K.argmax(y_true, axis=-1), K.argmax(y_pred, axis=-1)))


which means "how often predictions have maximum in the same spot as true values"

There is also an option for top-k categorical accuracy, which is similar to one above, but calculates how often target class is within the top-k predictions.

Thank you for the answer. Does that mean even for binary classification, the labels need to be one hot encoded? – Raghuram – 2017-03-20T05:02:52.093

@Raghuram No, for binary classification you just need 0 or 1 as class, no need to one hot encode them. Since K.mean(K.equal(y_true, K.round(y_pred))) will match 2 float values for each case, so it has to be 0 or 1 and not [0,1],[1,0]. – Divyanshu Kalra – 2017-07-04T20:13:04.307

For categorical accuracy, use categorical_accuracy. – Shital Shah – 2017-12-23T11:12:06.093

2for a multi-class problem (with more than two classes), is there a difference between using "accuracy" vs "categorical_accuracy" – Quetzalcoatl – 2018-11-06T20:03:32.903

2

And just in case, if the classes are mutually exclusive then use sparse_categorical_accuracy instead of categorical_accuracy, this usually improves the outputs. The difference is discused here.

– Noir – 2019-12-10T19:51:19.093

@mikhail - in my case my GT labels are [ 1 0 0 0 0 1 ] and values are generally [ 0.23 0.34 0.45 0.22 0.10 0.9] ..basically only the last one matches and the rest are counted as match because of the threshold artificially inflating results ..any suggestions on what other metric can be used here ? – Vikram Murthy – 2020-04-16T07:04:22.470

What is K? Because if it's supposed to be keras, I get module 'tensorflow.keras' has no attribute 'round' – Jack M – 2020-12-05T19:29:09.820