This suggests that all the training examples have a fixed sequence length, namely `timesteps`

.

That is not quite correct, since that dimension can be `None`

, i.e. variable length. Within a single *batch*, you must have the same number of timesteps (this is typically where you see 0-padding and masking). But between batches there is no such restriction. During inference, you can have any length.

Example code that creates random time-length batches of training data.

```
from keras.models import Sequential
from keras.layers import LSTM, Dense, TimeDistributed
from keras.utils import to_categorical
import numpy as np
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(None, 5)))
model.add(LSTM(8, return_sequences=True))
model.add(TimeDistributed(Dense(2, activation='sigmoid')))
print(model.summary(90))
model.compile(loss='categorical_crossentropy',
optimizer='adam')
def train_generator():
while True:
sequence_length = np.random.randint(10, 100)
x_train = np.random.random((1000, sequence_length, 5))
# y_train will depend on past 5 timesteps of x
y_train = x_train[:, :, 0]
for i in range(1, 5):
y_train[:, i:] += x_train[:, :-i, i]
y_train = to_categorical(y_train > 2.5)
yield x_train, y_train
model.fit_generator(train_generator(), steps_per_epoch=30, epochs=10, verbose=1)
```

And this is what it prints. Note the output shapes are `(None, None, x)`

indicating variable batch size and variable timestep size.

```
__________________________________________________________________________________________
Layer (type) Output Shape Param #
==========================================================================================
lstm_1 (LSTM) (None, None, 32) 4864
__________________________________________________________________________________________
lstm_2 (LSTM) (None, None, 8) 1312
__________________________________________________________________________________________
time_distributed_1 (TimeDistributed) (None, None, 2) 18
==========================================================================================
Total params: 6,194
Trainable params: 6,194
Non-trainable params: 0
__________________________________________________________________________________________
Epoch 1/10
30/30 [==============================] - 6s 201ms/step - loss: 0.6913
Epoch 2/10
30/30 [==============================] - 4s 137ms/step - loss: 0.6738
...
Epoch 9/10
30/30 [==============================] - 4s 136ms/step - loss: 0.1643
Epoch 10/10
30/30 [==============================] - 4s 142ms/step - loss: 0.1441
```

@kbrose is correct. However, I have one concern. In the example, you have a very special generator of infinitely yields. More importantly, it is designed to yield batches of size 1000. In practice, this is too hard to satisfy, if not impossible. You need to re-organize your entries so those with the same length are arranged together, and you need to carefully set batch split positions. Moreover, you have no chance to make shuffle across the batches. So my opinion is: never use varying length input in Keras unless you exactly know what you are doing. Use padding and set

`Masking`

layer to ignor – Bs He – 2019-04-16T23:10:10.580