I have one single, very long time series. I want to train an LSTM to distinguish between two behaviours (A or B) at every timestep (sequence-to-sequence).
Because the time series is very long, I plan to extract shorter, partially-overlapping subsequences and use each of them as one training input for the LSTM.
In my train/validation/test split, do I have to use older subsequences for training and newer for validation and test? Or can I treat them as if they were independent samples and just randomly shuffle them, given that anyway the LSTM will start each subsequence with empty memory?
The reason I ask is because I noticed that, due to how the timeseries was collected, the first half contains mostly behaviour A while the second half mostly behaviour B. This would cause training to be mostly on A and testing mostly on B, which does not reflect the fact that, in production, the system will see both periods of predominant A and periods of predominant B.