Is time series multi-step ahead forecasting a sequence to sequence problem?

8

4

I'm using the keras package in order to train an LSTM for a univariate time series of type numeric (float). Performing a 1-step ahead forecast is trivial, but I'm not sure how to perform a, let's say, 10-step ahead forecast. Two questions:

1) I read about sequence to sequence NNs, but can barely find anything of it in the context of time series forecasting. Am I right with the assumption that the forecasting of more than 1 time step in advance is a seq2seq problem? That makes sense to me because each forecast depends on its predecessor.

2) An intuitive solution without seq2seq would be: Perform 1-step ahead forecast, then append this forecast to the series and use it to obtain the next forecast, and so on. How would this differ from a seq2seq approach?

sevelf

Posted 2016-12-05T12:56:09.450

Reputation: 81

>

  • It can be tackled with a seq2seq model, since you have a sequence prediction problem. 2. It would suffer from an accumulation on prediction error (noise).
  • < – Emre – 2017-08-09T15:33:11.417

    I'm still studying about seq2seq so cannot comment on the 2 points above but I would recommend that you refer below tutorial from Dr Jason Brownlee and I am sure this is what you may be looking for- http://machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

    – Nitin Mahajan – 2017-08-09T12:30:13.803

    Answers

    5

    Seq2Seq architecture can definitely be used for time series problem. The only twist is that you will need a linear layer on top of your decoder to project the outputs to the required size (for example, 1 for univariate).

    The stepwise forecast approach can be used for short sequences but because any biases are compounded using this approach, it is not good for longer sequences.

    For example, if you have a sequence where the value is constant at each time step $x_{i+1} = x_i$, but you model learned to do $x_{i+1}=1.01x_i$ instead (which is highly likely with gradient descent algorithm). For $t=10$, the target value will be $1^{50} = 1$, however you model will predict $1.01^{50}=1.64$.

    Thus a 1% single step error results in a 64% difference in 50 steps.

    Louis T

    Posted 2016-12-05T12:56:09.450

    Reputation: 1 048