I trained a recurrent neural network (if it matters - it contains three CuDNNLSTM cells and 3 Dense layers, Dropout = 0.2). The result of data preparation is one array of ~330.000 sequences. Each contains 256 time steps and 24 features in each time step. This array is normalized, shuffled and balanced. Then it is split into two arrays - train array contains 90% of data (so ~297k) and validation array contains ~10%.
During training process (Adam optimizer, 128 or 256 batches) max accuracy of validation data set is 90%. Epoch accuracy is 94%.
Then I run my additional validation test script with more realistic, unbalanced data set. The accuracy of test is just ~50%. I checked this second test if it is correct and there is no errors in the code, but when I run it on data, that was included in the training set, the accuracy was 87%, so it looks good.
What is going on? I suppose, that wrong architecture is used.
Thank you for support.
Today I trained the network one more time. I used the unbalanced data from second test as validation data set in traininge process. You can see the results below.I stopped training after 29 epochs.
Blue: balanced valitadion data set
Orange: unbalanced validation data set
It doesn't look good at all.