Let's say I have a set of n time-series with sequence length 8
And let's define the input that LSTM expects as a tensor of shape (samples,sequence-length,features)
I want to predict the last value of each one of them.
I cant either feed the network a sequence of the 7 first values with the 8th as the target:
[a,b,c,d,e,f,g] -> [h]
In this case the tensor would be (samples,7,1)
Or I can do the classic rolling window, with a window size of, say, 2.
[a,b], [b,c], [c,d], [d,e], [e,f], [f,g] -> [h]
In effect this shortens the length of the sequence.
And the input tensor would be (samples,2,1).
What are the trade-offs between performing rolling-windows or giving the "crude" time-series to the LSTM?