How to maximize recall?



I'm a little bit new to machine learning.

I am using a neural network to classify images. There are two possible classes. I am using Sigmoid activation at the last layer so the scores of images are between 0 to 1.

I expected the scores to be sometimes close to 0.5 when the neural net is not sure about the class of the image, but all scores are either 1.0000000e+00 (due to rounding I guess) or very close to zero (for exemple 2.68440009e-15). In general, is that a good or bad thing ? I have the feeling it's not. If it is not, why? and how can it be avoided?

In my use case I wanted to optimize for recall by manually setting the necessary score to classify an image as beonging to class 1 to be greater than 0.6 or 0.7 instead of 0.5, but this has no impact because of what I described above.

More generally, how can I minimize the number of false negatives when in training the neural net only cares about my not ad-hoc loss ? I am ok with decreasing accuracy a little bit to increase recall.


Posted 2018-03-09T15:36:05.657

Reputation: 304



Train to avoid false negatives

What your network learns depends on the loss function you pass it. By choosing this function you can emphasize various things - overall accuracy, avoiding false negatives, false positives etc.

In your case you probably use a cross entropy loss in combination with a softmax classifier. While softmax squashes the prediction values to be 1 when combined across all classes, the cross entropy loss will penalise the distance between the actual ground truth and the prediction. In this calculation it will not take into account what the values of the "false negative" predictions are. In other words: The loss function only cares for the correct class and its related prediction, not for the values of all other classes.

Since you want to avoid false negatives this behaviour is probably the exact thing you need. But if you also want the distance between the actual class and the false predictions another loss function that also takes into account the false values might even serve you better. Give your high accuracy this poses the risk that your overall performance will drop.

What to do then?

Making the wrong prediction and being very sure about it is not uncommon. There are millions of things you could look at, so your best guess probably is to investigate the error. E.g. you could use a confusion matrix to recognize patterns which classes are mixed with which. If there is structure you might need more samples of a certain class or there are probably labelling errors in your training data.

Another way to go ahead would be to manually look at all (or some) examples of errors. Something very basic as listing the errors in a table and trying to find specific characteristics can guide you towards what you need to do. E.g. it would be understandable if your network usually gets the "difficult" examples wrong. But maybe there is some other clear systematic your network did not pick up yet due to lack of data?


Posted 2018-03-09T15:36:05.657

Reputation: 448

I am just using sigmoid activation but I am indeed using binary cross entropy. I used a confusion matrix and with only two classes it's easy to know which classes are mixed up ;) Would you recommend changing my loss so that distance to the right answer is taken into account and I can manually set a threshold that's better to improve either precision or recall ? – Louis – 2018-03-09T18:35:37.253

I missed the fact that you only have 2 classes. In that case the information how "wrong" the prediction is (weight for wrong class) is mirrored by how "right" the prediction is (weight for correct class). Binary cross entropy loss comes down to log (p) with p=predicted value for the correct class. The smaller p, the larger the loss. The distance to the right class is already considered. Both predicted classes are treated equally here. But it sounds like for you they are not. You would prefer having more predictions for one class to avoid false negatives even if it hurts accuracy, right? – Gegenwind – 2018-03-10T07:28:47.890

Thank you very much for your answers :) Yes exactly you summed it up perfectly – Louis – 2018-03-10T10:36:34.567


To answer the last question, suppose that you have a binary classification problem. It is customary to label the class as positive if the output of the Sigmoid is more than 0.5 and negative if it's less than 0.5. For increasing recall rate you can change this threshold to a value less than 0.5, e.g. 0.2. For tasks which you may want a better precision you can increase the threshold to bigger value than 0.5.

About the first part of your question, it highly depends on your data and its feature space. There are problems which the data is linearly separable in higher dimensions which means you can easily employ just a single neuron for classifying the data by a single hyper-plane. If it has happened that you have such a good accuracy you can not say anything unless you try to find the value of cross validation error. By interpreting the difference between the value of training data, and cross-validation or maybe test data, you can figure out whether your classifier performs well or not.


Posted 2018-03-09T15:36:05.657

Reputation: 12 077

1Thank you very much for your answer, I'm sorry I was not clear, I tried exactly what you advise in the first part of your answer but it is not applicable to my issue because every output of the sigmoid in the test set is either very very close to 1 or 0, meaning if I tweak the threshold to 0.2 or 0.7 or even 0.999 it will have no effect on the classification – Louis – 2018-03-09T17:00:51.200

1@Louis What is the training and testing error? – Media – 2018-03-09T17:04:24.583

Training error is 0.09 and testing error is 0.11. So even when the network is wrong during testing, it is very sure of its answer :( – Louis – 2018-03-09T17:06:46.780

I guess you should first find a network that has the capability to learn your training data. Try to change the hyper-parameters. What is your network? – Media – 2018-03-09T17:09:36.043

Accuracy is 0.88 on the testing set which is alright I think, it's a convolutional neural network (I fined tune VGG16) – Louis – 2018-03-09T17:11:11.053

88 percent may be good in some contexts but if the difference of accuracy of train and test is more than 5, you may have overfìtted. – Media – 2018-03-09T17:15:14.073

I'm happy with the accuracy (0.88 for test and 0.91 for training) I just wanted to know if there is a way to prevent my model to be very wrong when he is wrong so that I have the ability to tweak the threshold we were talking about. To be more accurate, it's ok that the model is wrong sometimes, but if he says 0.99999999 when the class is 0, it's not great – Louis – 2018-03-09T17:20:50.290

2@Louis there are different solutions for that. One of them is as follows: try to find out which images are labeled wrongly. Then try to find why it has happened, the probable answers may be low number of data which are like the ones that have been classified wrongly. This is caused because of unbalanced data for different labels or maybe because of low number of epochs. There are other approaches too. – Media – 2018-03-09T17:27:23.203