This is an issue that I have come across over and over again. Loss (cross-entropy in this case) and accuracy plots that do not make sense. Here is an example: Here, I’m training a ReNet18 on CIFAR10. Optimizer is SGD with 0.1 learning rate, 0.9 Nesterov momentum, 1e-4 weight decay. The learning rate is decreased to a ⅕ at epochs 60, 120, 160.
- Initially the curves are all nice and dandy; meaning training and validation loss are decreasing and accuracies are increasing.
- Around epoch 65~70 you see signs of overfitting; as val. loss starts increasing and val. accuracy starts decreasing (the red box). There is still nothing strange here.
Now there are two things that don’t make sense to me:
After epoch 120 (where LR is decreased) val. loss and accuracy start improving for a couple of epochs (the green box). Why would decreasing the learning rate suddenly improve validation performance of a model that was already overfitting?! I would expect the drop in LR to actually accelerate overfitting.
After epoch ~125 (the blue box) loss starts going up but accuracy keeps improving. I understand that loss could go up while accuracy stays constant (by the model getting more confident in its wrong predictions or less confident in its correct predictions). But I don’t get how accuracy can improve while loss goes up.