9

6

I'm working on using an LSTM to predict the direction of the market for the next day.

My question concerns the input for the LSTM. My data is a financial time series $x_1 \ldots x_t$ where each $x_i$ represents a vector of features for day $i$, i.e $x_i \in \mathbb{R}^D$. The target variable $y_i$ is the direction of the market for the next day $$ y_i = I_{P_{i+2} - P_{i+1} > 0} $$ where $P_i$ is the opening price of the asset for day $i$.

I am wondering now how to configure the input to fit in an LSTM framework. The LSTM requires a sequence of length $T$ and uses this together with target $y_T$. One approach is a rolling window. Take input as input the sequences $I_1 = x_1, \ldots, x_T$, $I_2 = x_2, \ldots, x_{T+1} \ldots $ and use $y_T, y_{T+1} \ldots$ as the target variables. The problem with this is that $I_1, I_2$ are very similar, they differ only on two points yet the target variable may be $1$ for the first series and $0$ for the other, making it impossible for the LSTM to learn in my opinion.

I am wondering if anybody has any idea of how to approach this problem. The rolling window approach above is considering the inputs $I_k$ as independent, similar to if we would have input $I_k$ to be a sentence which should be classified as french or english. I want the LSTM to take into consideration that the sequences its fed are all parts of the same long sequence if that makes sense.

A lot of papers use recurrent neural networks for this problem, but never really specify on how they structured the input.

1How about building an LSTM auto-encoder on your target variable, and then using the latent variable as a target for your current LSTM model? The latent variable might incorporate some kind of continuity at the breakpoints that you find problematic – shadi – 2019-02-05T06:16:55.390

I am also interested in the last part of your question, how to structure the input. Increasing the rolling window size changes the speed and the performance, but what size makes sense? What other input structures would be useful? – Donald S – 2020-06-14T05:00:43.563