Relationship between train and test error



I have some specific questions for which I could not extract answers from books. Therefore, I ask for help here and shall be extremely grateful for an intuitive explanation if possible.

In general, neural networks have a bias/variance tradeoff and thus we need to have a regularizer. Higher bias --> underfitting; Higher Variance--->overfitting.

To solve overfitting, we use regularization for contraining the weight. This is a hyperparameter and should be learned during training based on my understanding using cross-validation. Thus, the dataset is split into a train, validation and test set. The test set is independent and is unseen by the model during learning, but we have the labels available for it. We usually report the statistics such as false positives, confusion matrix, misclassification based on this test set.

Q1) Is this bias/variance problem encountered in other algorithms such as SVM, LSTM etc as well?

In convolutional neural network (Matlab toolbox) I have not seen any option for specifying the regularization constant. So, does this mean that CNN's don't need a regularizer?

Q2) What is the condition if training error and test error are both zero? Is this the ideal best situation?

Q3) What is the condition if training error > test error?

Q4) What is the condition if training error > validation error?

Please correct me where wrong. Thank you very much.

Ria George

Posted 2018-10-04T00:48:03.010

Reputation: 115



First of all be very clear with the use of the Training set, Validation set and Testing set. These play a crucial part in tuning your DL model. Usually, a validation dataset is used for keeping a check over the model during the training. An intutive observations are noted with the training validation and testing accuracy during data fitting:

  1. If the model is having high validation accuracy and low training accuracy, it is an underfit model.

  2. If the model has higher training accuracy and low validation accuracy, it is overfit.

Bias-variance trade-off problem is a central problem of supervised machine learning algorithms.

The bias–variance decomposition is a way of analyzing a learning algorithm's expected generalization error with respect to a particular problem as a sum of three terms, the bias, variance, and a quantity called the irreducible error, resulting from noise in the problem itself.



Posted 2018-10-04T00:48:03.010

Reputation: 1 913

thank you for your update. You mentioned the accuracies and bias/variance tradeoff with respect to the validation set. Can we report the underfit, overfit with respect to test set? or is it generally reported with respect to the validation set? In my implementation, I have train,validation & a separate labelled unseen test set. I checked for the validation set as well as testing set. My testing error and training error are almost equal. Is it wrong to report underfit & overfit situation with respect to the test set? – Ria George – 2018-10-04T18:53:10.597

Yes, the same holds true for test set too. Here the only difference that will arise is that you will not be able to access your model until its training cycle is completed. – thanatoz – 2018-10-04T19:17:34.220

So, is this the reason why we use validation set to determine overfit-underfit situation as it is occurring during training & so we have the option of pausing the training? – Ria George – 2018-10-04T19:28:12.393

Yes, the validation set plays a crucial role in tuning our models. Refer to this post. It will provide better clarity.

– thanatoz – 2018-10-04T19:41:52.713

thank you for your comments. I will go through the references and links. What happens if Training error < Val error But the test error =0? This is the last doubt that I have. Can you please help? – Ria George – 2018-10-04T19:50:24.977

If your test error comes out to zero but the training error is < validation error, there is a probability that your test set is a direct subset of the training set. – thanatoz – 2018-10-05T06:10:22.477


1) In the link you provide, it says

You can also try increasing the L2 regularization using the 'L2Regularization' name-value pair argument, using batch normalization layers after convolutional layers, and adding dropout layers. So it looks like you can apply regularization.

2) In that case, you have a perfect model and your data is virtually noiseless.

3) Then you have 'underfitted'. However, you're only supposed to use the test set once, so if you now go back and chance your model, you're defeating the purpose of having a test set.

4) Then you have 'underfitted', and by chance you obtained a better score on unseen data. This would usually call for a increasing the flexibility of the model.


Posted 2018-10-04T00:48:03.010

Reputation: 733

Thank you for your answer and the supporting justification. However, I do need a clarification which is in points 3) & 4) situations are both underfit? So whenever training error is greater than test & validation error, we have an underfit situation. On the other hand, if training error < validation & test error we have an overfit situation. Is my understanding correct? – Ria George – 2018-10-04T18:19:30.680

In the other answer, these overfit & underfit situations are reported with respect to the validation set. If we have a separate unseen data as the test set, then do we always use validation set to determine overfit & underfit conditions? I am a bit confused since you mention with respect to the test set an the other answer is with respect to the validation set. – Ria George – 2018-10-04T19:10:51.303