Time Series prediction using LSTMs: Importance of making time series stationary



In this link on Stationarity and differencing, it has been mentioned that models like ARIMA require a stationarized time series for forecasting as it's statistical properties like mean, variance, autocorrelation etc are constant over time. Since RNNs have a better capacity to learn non-linear relationships (as per given here: The Promise of Recurrent Neural Networks for Time Series Forecasting) and perform better than traditional time series models when the data is large, it is essential to understand how stationarized data would affect its results. The questions I need to know the answer of are as follows:

  1. In case of traditional time series forecasting models, stationarity in time series data makes it easier to predict, why and how?

  2. While building a time series prediction model using LSTMs, is it important to make the time series data stationary? If so, then why?

Abhijay Ghildyal

Posted 2017-11-16T07:57:54.843

Reputation: 635



In general time series are not really different from other machine learning problems - you want your test set to 'look like' your training set, because you want the model you learned on your training set to still be appropriate for your test set. That's the important underlying concept regarding stationarity. Time series have the additional complexity that there may be long term structure in your data that your model may not be sophisticated enough to learn. For example, when using an autoregressive lag of N, we can't learn dependencies over intervals longer than N. Hence, when using simple models like ARIMA, we want data to also be locally stationary.

  1. As you said, stationary just means the model's statistics don't change over time ('locally' stationary). ARIMA models are essentially regression models where you use the past N values as input to linear regression to prediction the N+1st value. (At least, that's what the AR part does). When you learn the model you're learning the regression coefficients. If you have a time series where you learn the relationship between the past N points and the next point, and then you apply that to a different set of N points to predict the next value, you are implicitly assuming that the same relationship holds between the N predictor points and the following N+1st point you're trying to predict. That's stationarity. If you separated your training set into two intervals and trained on them separately, and got two very different models - what would you conclude from that? Do you think you would feel confident applying those models to predict new data? Which one would you use? These issues arise if the data is 'non-stationary'.

  2. My take on RNNs is this - you are still learning a pattern from one segment of a time series, and you still want to apply it to another part of the time series to get predictions. The model learns a simplified representation of the time series - and if that representation applies on the training set but not in the test set, it won't perform well. However, unlike ARIMA, RNNs are capable of learning nonlinearities, and specialized nodes like LSTM nodes are even better at this. In particular, LSTMs and GRUs are very good at learning long-term dependencies. See for example this blog post. Effectively this means that what is meant by 'stationarity' is less brittle with RNNs, so it's somewhat less of a concern. To be able to learn long term dependencies, however, you need LOTS of data to train on.

Ultimately the proof is in the pudding. That is, do model validation like you would with any other machine learning project. If your model predicts well for hold-out data, you can feel somewhat confident in using it. But like any other ML project - if your test data is ever significantly different than your training data, your model will not perform well.


Posted 2017-11-16T07:57:54.843

Reputation: 1 938

2This answer is excellent. Well-considered and thorough. – StatsSorceress – 2018-03-02T14:18:23.457

1It's been awhile. Has anyone tested this assumption? – compguy24 – 2019-03-19T20:40:43.300