RNN: Different test results on balanced and unbalanced data


I trained a recurrent neural network (if it matters - it contains three CuDNNLSTM cells and 3 Dense layers, Dropout = 0.2). The result of data preparation is one array of ~330.000 sequences. Each contains 256 time steps and 24 features in each time step. This array is normalized, shuffled and balanced. Then it is split into two arrays - train array contains 90% of data (so ~297k) and validation array contains ~10%.

During training process (Adam optimizer, 128 or 256 batches) max accuracy of validation data set is 90%. Epoch accuracy is 94%.

Then I run my additional validation test script with more realistic, unbalanced data set. The accuracy of test is just ~50%. I checked this second test if it is correct and there is no errors in the code, but when I run it on data, that was included in the training set, the accuracy was 87%, so it looks good.

What is going on? I suppose, that wrong architecture is used.

Here is a graph of epoch accuracy during training: Epoch accuracy

Here is a graph of validation accuracy during training validation accuracy

Thank you for support.


Today I trained the network one more time. I used the unbalanced data from second test as validation data set in traininge process. You can see the results below.I stopped training after 29 epochs.

Blue: balanced valitadion data set

Orange: unbalanced validation data set

Epoch accuracy: Epoch accuracy

Validation accuracy Validation accuracy

Validation loss Validation loss

It doesn't look good at all.


Posted 2019-04-06T18:44:13.170

Reputation: 68

1Looks like common generalization problem, network overfit on the 1st dataset. Try to train on realistic datset or augment 1st with random transformations if you can't get more training data. – mirror2image – 2019-04-07T09:36:48.187

Thank you for your answer. Today i run learning one more time with the same training data set but validation data from the second test. In a few hours we can verify your remark. – ketzul – 2019-04-07T09:44:38.117

No answers