Bias-variance tradeoff in practice (CNN)


I first trained a CNN on my dataset and got a loss plot that looks somewhat like this: low bias

Orange is training loss, blue is dev loss. As you can see, the training loss is lower than the dev loss, so I figured: I have (reasonably) low bias and high variance, which means I'm overfitting, so I should add some regularization: dropout, L2 regularization and data augmentation. After that, I get a plot like this: low variance

Now we see that the variance has decreased and the bias has increased. The model is overfitting less, is this correct? However, I would actually select the first model because it has lower validation loss.

My question is: in most literature, for the bias-variance tradeoff, they show the validation loss going up, but in my experiments this is not the case, so are these models actually overfitting? Are you overfitting as soon as training loss goes below validation loss, or only if validation loss goes back up? And is it okay to choose a model with high variance if the validation loss is lower?

I found this answer on a similar question, but what if your problem is so complex that you can't find an architecture that can overfit and then properly regularize the architecture? I can find an architecture that gets a training loss close(r) to zero, but then I can't really add enough dropout to make sure the variance is low. Also if I add augmentation my validation loss also goes up. Finally the answer confuses me, the answerer is talking about variance on the training set? But isn't bias always related to training loss and variance to dev loss?

Or am I just misinterpreting information and should I plot in function of dataset size rather than number of epochs to find out if I am overfitting?

Deer Jona

Posted 2019-01-17T09:40:28.840

Reputation: 41

These are fictional plots by the way, not actual loss plots, but made to illustrate the question. However I did base them on my actual model behavior. – Deer Jona – 2019-01-17T10:18:39.327

It should be noted that depending on which framework you are using and how you have set up your model, your L2 penalty loss may or may not be included in the graph you are showing. I recommend to plot the regularization loss and classification (or other task) loss separately. – Gouda – 2019-02-18T23:39:11.703



Normally, the training loss is lower than the validation one. This does not indicate any overfitting. Indeed, it is even suspicious when you training loss is higher than the validation loss. From other hand, worsening of the validation accuracy while improving on the train set definitely tells you that you overfits.

Generally speaking, overfitting means bad generalization, memorization of the training set rather than learning a generic concepts behind the data. Besides the metrics during the training you can find it out by trying your model on external datasets from a similar but not the same domain/distribution. Very poor accuracy will indicate the overfitting that might be hidden by a validation set very similar to the training set.

Dmytro Prylipko

Posted 2019-01-17T09:40:28.840

Reputation: 676

Thank you for your answer. Does that mean that I could actually try to increase my model capacity even further rather than adding regularization? – Deer Jona – 2019-01-17T10:47:36.100

You can do both. Normally, there is a set of regularization techniques (data augmentation, dropout, L2 regularization) that are used always by default. Having that you increase your model's size until you stop getting improvement. – Dmytro Prylipko – 2019-01-17T10:58:31.090