Keras LSTM with 1D time series

10

9

I'm learning how to use Keras and I've had reasonable success with my labelled dataset using the examples on Chollet's Deep Learning for Python. The data set is ~1000 Time Series with length 3125 with 3 potential classes.

I'd like to go beyond the basic Dense layers which give me about 70% prediction rate and the book goes on to discuss LSTM and RNN layers.

All the examples seem to use datasets with multiple features for each timeseries and I'm struggling to work out how to implement my data as a result.

If for example, I have 1000x3125 Time Series, how do I feed that into something like the SimpleRNN or LSTM layer? Am I missing some fundamental knowledge of what these layers do?

Current code:

import pandas as pd
import numpy as np
import os
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, Dropout, SimpleRNN, Embedding, Reshape
from keras.utils import to_categorical
from keras import regularizers
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

def readData():
    # Get labels from the labels.txt file
    labels = pd.read_csv('labels.txt', header = None)
    labels = labels.values
    labels = labels-1
    print('One Hot Encoding Data...')
    labels = to_categorical(labels)

    data = pd.read_csv('ts.txt', header = None)

    return data, labels

print('Reading data...')
data, labels = readData()

print('Splitting Data')
data_train, data_test, labels_train, labels_test = train_test_split(data, labels)

print('Building Model...')
#Create model
model = Sequential()
## LSTM / RNN goes here ##
model.add(Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
 
print('Training NN...')
history = model.fit(data_train, labels_train, epochs=1000, batch_size=50,
    validation_split=0.25,verbose=2)

results = model.evaluate(data_test, labels_test)

predictions = model.predict(data_test)

print(predictions[0].shape)
print(np.sum(predictions[0]))
print(np.argmax(predictions[0]))

print(results)

acc = history.history['acc']
val_acc = history.history['val_acc']
epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

user1147964

Posted 2018-02-06T16:33:32.570

Reputation: 145

Answers

10

LSTM layers require data of a different shape.

From your description, I understand the starting dataset to have 3125 rows and 1000 columns, where each row is one time-step. The target variable should then have 3125 rows and 1 column, where each value can be one of three possible values. So it sounds like you're doing a classification problem. To check this in code, I would do:

>>> X.shape
(3125, 1000)

>>> y.shape
(1000,)

The LSTM class requires each single sample to consist of a 'block' of time. Let's say you want to have a block of 100 time-steps. This means X[0:100] is a single input sample, which corresponds to the target variable at y[100]. this means your window size (a.k.a number of time-steps or number of lags) is equal to 100. As stated above, you have 3125 samples, so N = 3125. To form the first block, we unfortunately have to discard the first 100 samples of y, as we cannot form an entire block of 100 from the available data (we would end up needing the data points before X[0]).

Given all this, an LSTM requires you to deliver batches of shape (N - window_size, window_size, num_features), which translates into (3125 - 100, 100, 1000) == (3025, 100, 1000).

Creating these time-blocks is a bit of a hassle, but create a good function once, then save it :)

There is more work to be done, perhaps look at more in depth examples of my explanation above here... or have a read of the LSTM documentation, (or better still, the source code!).

The final model would then be simple enough (based on your code):

#Create model
model = Sequential()
model.add(LSTM(units=32, activation='relu',
               input_shape=(100, 1000))    # the batch size is neglected!
model.add(Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam',
              metrics=['accuracy'])

Have a look at the documentation regarding input shape for the Sequential model. It basically says that we don't need to specify the number of batches within input_shape. It can be done with e.g. batch_size=50, if you require it to be a fixed number.

I know the input_shape argument is not in the documentation for LSTM, but the class itself inherits from RNN, which in turn inherits from Layer - so it will be able to use the info you provide.

One last tip: if you plan on adding several LSTM layers ('stacking' them), then you shall need to add one more argument to all but the last LSTM, namely, the return_sequences=True.

n1k31t4

Posted 2018-02-06T16:33:32.570

Reputation: 12 573

Thank you for the comprehensive reply Dexter(!). Regarding your comments on batch size, is the batch_size specified in the model.fit argument a different hyper parameter compared to the making my own custom batch?

I managed to get my code to at least run by reshaping my data from a 1000x3125 matrix into a 3D matrix using data = np.reshape(data,(1000,1,3125)). This let me run the LSTM with input_shape(1,3125) but again, I'm not really sure what I'm doing.

Again, thank you very much for the reply. I'll have a look at the links you provided and study your answer some more. – user1147964 – 2018-02-07T10:29:44.070

You're welcome! Yes, you got it, if you leave out batch_size when defining the model, it will be taken from the same argument within model.fit(). You should be reshaping to get (3025, 100, 1000), which means 3025 batches, each of 100 (rows) timesteps and 1000 (columns) variables. Using np.reshape will sadly not work for this (you'll get an error), due the fact that you will have data overlaps... the final shape has more data than the input. 3025x100x1000 > 3125x1000 - np.reshape doesn't like that as it's ambiguous. I suggest simply looping over the dataset, 1 loop = 1 sample. – n1k31t4 – 2018-02-07T11:15:16.270

I think I'm a bit confused here and it could be because I may have inadvertently already done the batching process.I'll use specific values here. I sampled 3 different measurements at 6.25 kHz for roughly 3 minutes, resulting in 3 time series of length 1093750. This generates a 3x1093750 matrix. I then segmented each TS into 0.5 second increments, resulting in a 1050x3125 matrix. I could technically restructure this into a 3D matrix with dimensions 3x350x3125. This gives me 350, 0.5s long "batches".

Your reshaping seems to generate many more values

Thanks for the response again. Sorry – user1147964 – 2018-02-07T12:22:50.193

Just to add, reading the first link you posted makes me think I'm reshaping things correctly. Apologies if I'm missing something obvious but here they start with a TS length 5000, and turns it into a 3D matrix with dimensions [1 25 200].

– user1147964 – 2018-02-07T12:43:08.847

Compared to the method in your link, my way will create many more samples. This is because I am using a kind of 'rolling' window. Have a look at this depiction. They don't use a rolling window. Making 3 mins into 350x0.5s chunks is ok (maybe not needed -how often do you predict?), each chunk should be 3x3125. "I could restructure this into a 3D matrix with dimensions 3x350x3125" - this sounds better, but after making the splits I'd expect 350x3x3125 (350 chunks of 3x3125). Each of these chunks could then be processed as I described.

– n1k31t4 – 2018-02-07T13:47:09.297

Ah I see, you're developing a rolling window. So essentially, for every time series, you're generating 100 different lagged time series? For each time series, instead of having 1 feature per time segment, there's now 100? With each of those 100 time series being offset by a proportional amount of time samples? What is the advantage of increasing the feature count like this?

Thanks again for the reply – user1147964 – 2018-02-07T14:17:09.093

I believe we're on the same page :) The rolling window is common in time-series analysis methods, e.g. in finance. I 'speak' it like this: "If i want to predict the price of a stock tomorrow, what is relevant? Answer: the past T days". This means I can make a prediction for the next timestep, N-T times. If you don't use a rolling window, you can only make N/T predictions. What's really going to bake your noodle, is the idea of stateful LSTM models; whether or not the model's state should be passed preserved between subsequent batches! :)

– n1k31t4 – 2018-02-07T14:29:14.370

Thanks again for the answer. I'll have a play around with time lags and see how much like I have. I'm starting to wonder if I should be using something like a CNN for this work however since I'm more interested in classifying the TS as a whole, and comparing it to other TS. Does that sound sensible? – user1147964 – 2018-02-07T14:38:35.737

Best to pose that as a separate question, listing your reasons and what you have already tried. To that end, perhaps you don't need a neural net, something simpler like KNearestNeighbour might be able to classify the entire dataset (or your chunks). A CNN maps correlations, making good use of spatial correlation. As you have time-series, a makes more intuitive sense to me. However, as you essentially want to classify the state of the system, you could look into regime switching models to see when a system state changes e.g. Markov switching models.

– n1k31t4 – 2018-02-07T15:04:21.860

Thanks again for your response Dexter. I think I've got a lot of homework! – user1147964 – 2018-02-07T15:05:47.663