4

1

My understanding is that for some types of seq2seq models, you train an encoder and a decoder, and then you set aside the encoder and use only the decoder for the prediction step. For example this seq2seq time series prediction model from Uber:

Now I am trying to implement a to version of this in Keras.

This is the Keras code for a vanilla LSTM:

```
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(n_steps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# train model
model.fit(X, y, epochs=200, verbose=0)
# predict
x_input = array([70, 80, 90])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
```

This is the Keras code for a stacked LSTM model:

```
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', return_sequences=True, input_shape=(n_steps, n_features)))
model.add(LSTM(50, activation='relu'))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
# train model
model.fit(X, y, epochs=200, verbose=0)
# predict
x_input = array([70, 80, 90])
x_input = x_input.reshape((1, n_steps, n_features))
yhat = model.predict(x_input, verbose=0)
print(yhat)
```

And this is the Keras code for an encoder-decoder model:

```
# define model
model = Sequential()
model.add(LSTM(100, activation='relu', input_shape=(n_steps_in, n_features)))
model.add(RepeatVector(n_steps_out))
model.add(LSTM(100, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss='mse')
# train model
model.fit(X, y, epochs=100, verbose=0)
# predict
x_input = array([70, 80, 90])
x_input = x_input.reshape((1, n_steps_in, n_features))
yhat = model.predict(x_input, verbose=0)
```

The problem is, I don't see much difference between the encoder-decoder code and the vanilla and stacked LSTM. In particular, I don't see how we are using only the decoder in the predict step, and which variable or method in Keras corresponds to the embedding that we would be using as an input for predicting new time series?

How can I implement the code for a model similar to the one in the illustration using Keras?