Does this encoder-decoder LSTM make sense for time series sequence to sequence?




given $\vec x = [x_{t=-3}, x_{t=-2}, x_{t=-1}, x_{t=0}]$

predict $\vec y = [x_{t=1}, x_{t=2}]$

Whith an LSTM encoder-decoder (seq2seq)

MODEL enter image description here

NOTE: the ? symbol in the shape of the tensors refers to batch_size, following tensorflow notation...


Is it worth trying this architecture? (I think it took me more time to draw the picture than coding it...)

The difference with typical seq2seq is that in the decoder, the input for the second time step is not the output of the previous step. The input for both time steps in the decoder is the same, and it is an "encoded" version of the all hidden states of the encoder.


Posted 2018-12-12T13:07:32.217

Reputation: 1 478



Yes, it makes sense. Seq2seq models represent, in the RNN family, the best for multistep predictions. More classical RNNs, on the other side, are not that good for predicting long sequences.

If you need to implement a seq2seq model in TensorFlow 2.0 / Keras, each model follows the following structure:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, RepeatVector, Dense, TimeDistributed
from tensorflow.keras.activations import elu

# these are just made up hyperparameters, change them as you wish
hidden_size = 50

seq2seq = Sequential([

    LSTM(hidden_size, input_shape = (input_sequence_length, no_vars)),


    LSTM(hidden_size, return_sequences = True), 

    Dense(hidden_size, activation = elu)

    TimeDistributed(Dense(1, activation = elu))


and then train it as usual with seq2seq.compile() and

If you want to stack more LSTM() layers on top of each other, simply add them to the model I depicted above. Please keep in mind that LSTM models are very computationally expensice; without a good GPU even this "basic" model could be painfully long to train.

One modification I'd suggest, looking at your image, is to make the LSTM-encoder and -decoder parts of equal size and depth.

Alternatively, you can implement a more classical "Autoencoder-like" architecture, with LSTM() layers for encoding and decoding, and Dense() layers in the middle. However, seq2seq models are the most powerful at the moment.

To my knowledge, the only models more state-of-the-art than this are attention models. The problem is that they are so much state-of-the-art that TensorFlow/Keras doesn't have built-in layers for them, and you'd have to create your own custom layers (it's a pain). The only extensive implementation of attention models I found is from this blog post, but things are going to be very very complicated here. I didn't try to implement one yet.


Posted 2018-12-12T13:07:32.217

Reputation: 4 928


Encoder Decoder architecture are extensively used for time series prediction pal, see them here and here. Second one is paper.


Posted 2018-12-12T13:07:32.217

Reputation: 11