## Input for LSTM for financial time series directional prediction

9

6

I'm working on using an LSTM to predict the direction of the market for the next day.

My question concerns the input for the LSTM. My data is a financial time series $x_1 \ldots x_t$ where each $x_i$ represents a vector of features for day $i$, i.e $x_i \in \mathbb{R}^D$. The target variable $y_i$ is the direction of the market for the next day $$y_i = I_{P_{i+2} - P_{i+1} > 0}$$ where $P_i$ is the opening price of the asset for day $i$.

I am wondering now how to configure the input to fit in an LSTM framework. The LSTM requires a sequence of length $T$ and uses this together with target $y_T$. One approach is a rolling window. Take input as input the sequences $I_1 = x_1, \ldots, x_T$, $I_2 = x_2, \ldots, x_{T+1} \ldots$ and use $y_T, y_{T+1} \ldots$ as the target variables. The problem with this is that $I_1, I_2$ are very similar, they differ only on two points yet the target variable may be $1$ for the first series and $0$ for the other, making it impossible for the LSTM to learn in my opinion.

I am wondering if anybody has any idea of how to approach this problem. The rolling window approach above is considering the inputs $I_k$ as independent, similar to if we would have input $I_k$ to be a sentence which should be classified as french or english. I want the LSTM to take into consideration that the sequences its fed are all parts of the same long sequence if that makes sense.

A lot of papers use recurrent neural networks for this problem, but never really specify on how they structured the input.

1How about building an LSTM auto-encoder on your target variable, and then using the latent variable as a target for your current LSTM model? The latent variable might incorporate some kind of continuity at the breakpoints that you find problematic – shadi – 2019-02-05T06:16:55.390

I am also interested in the last part of your question, how to structure the input. Increasing the rolling window size changes the speed and the performance, but what size makes sense? What other input structures would be useful? – Donald S – 2020-06-14T05:00:43.563

0

I think there's a partial answer to your question, about the part: "the sequences it's fed are all parts of the same long sequence"

You can train your LSTM model in stateful mode. There's a clear explanation here: http://philipperemy.github.io/keras-stateful-lstm/

In short, in stateful mode, the model will remember what it is previously fed and will also use that information for training. That way, it can sense that the current input is a part of a long sequence.

0

For a second, let us take the ML aspect out of the solution. In that case, your argument would still be - Since I1, I2....IN are so similar to I2, I3...IN+1 especially larger the 'N', then every new day stock price should ideally be very close or same as prev day stock price..after all hardly anything has changed when you look at the inputs. So this logically means the market should not move at all. Nonetheless it DOES and hence we need to see how we can improve the feature set to better capture all the relevant data which we can make use of.

This is a valid argument whether we use ML in the equation or not. Hence models using LSTM (or any model for that matter) too would face this issue during prediction. All your network has to do is produce a similar value to the last input of the price. Some of the models I have seen on the web do not take the 'prediction' as the input for the next prediction and instead take the 'actual' value. The 'ideal' model should take the prediction and predict.. Then we can truly say we have discovered a model that can predict stock prices. Of course it is quite difficult to do that.

Workarounds could include strengthening the features themselves instead of focussing on the model where the ROI could be lesser. For e.g. gathering all +ve and -ve sentiments around that stock could be one of the feature. In this case the prev day sentiment feature would play a much larger role in prediction. Another approach is to process Output data better by taking taking labels as outputs for the next 10 steps instead of just a single time step or some sort of a moving average. Another way would be to increase the time steps to more than 1 in the sliding-window approach. Predicting changes in value (a derivative or second derivative) instead of the value itself could prove to be more simpler relatively speaking. Also RNN outputs could be clubbed with other approaches including even CNN as a few papers have done. I am sure Quants would have many more tricks up their sleeve and quite a few ideas may be unpublished considering the financial ramifications.