10

9

I'm learning how to use Keras and I've had reasonable success with my labelled dataset using the examples on Chollet's *Deep Learning for Python*. The data set is ~1000 Time Series with length 3125 with 3 potential classes.

I'd like to go beyond the basic *Dense* layers which give me about 70% prediction rate and the book goes on to discuss LSTM and RNN layers.

All the examples seem to use datasets with multiple features for each timeseries and I'm struggling to work out how to implement my data as a result.

If for example, I have 1000x3125 Time Series, how do I feed that into something like the SimpleRNN or LSTM layer? Am I missing some fundamental knowledge of what these layers do?

### Current code:

```
import pandas as pd
import numpy as np
import os
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, Dropout, SimpleRNN, Embedding, Reshape
from keras.utils import to_categorical
from keras import regularizers
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
def readData():
# Get labels from the labels.txt file
labels = pd.read_csv('labels.txt', header = None)
labels = labels.values
labels = labels-1
print('One Hot Encoding Data...')
labels = to_categorical(labels)
data = pd.read_csv('ts.txt', header = None)
return data, labels
print('Reading data...')
data, labels = readData()
print('Splitting Data')
data_train, data_test, labels_train, labels_test = train_test_split(data, labels)
print('Building Model...')
#Create model
model = Sequential()
## LSTM / RNN goes here ##
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print('Training NN...')
history = model.fit(data_train, labels_train, epochs=1000, batch_size=50,
validation_split=0.25,verbose=2)
results = model.evaluate(data_test, labels_test)
predictions = model.predict(data_test)
print(predictions[0].shape)
print(np.sum(predictions[0]))
print(np.argmax(predictions[0]))
print(results)
acc = history.history['acc']
val_acc = history.history['val_acc']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
```

Thank you for the comprehensive reply Dexter(!). Regarding your comments on batch size, is the batch_size specified in the model.fit argument a different hyper parameter compared to the making my own custom batch?

I managed to get my code to at least run by reshaping my data from a 1000x3125 matrix into a 3D matrix using data = np.reshape(data,(1000,1,3125)). This let me run the LSTM with input_shape(1,3125) but again, I'm not really sure what I'm doing.

Again, thank you very much for the reply. I'll have a look at the links you provided and study your answer some more. – user1147964 – 2018-02-07T10:29:44.070

You're welcome! Yes, you got it, if you leave out

`batch_size`

when defining the model, it will be taken from the same argument within`model.fit()`

. You should be reshaping to get`(3025, 100, 1000)`

, which means 3025 batches, each of 100 (rows) timesteps and 1000 (columns) variables. Using`np.reshape`

will sadly not work for this (you'll get an error), due the fact that you will have data overlaps... the final shape has more data than the input.3025x100x1000 > 3125x1000-`np.reshape`

doesn't like that as it's ambiguous. I suggest simply looping over the dataset, 1 loop = 1 sample. – n1k31t4 – 2018-02-07T11:15:16.270I think I'm a bit confused here and it could be because I may have inadvertently already done the batching process.I'll use specific values here. I sampled 3 different measurements at 6.25 kHz for roughly 3 minutes, resulting in 3 time series of length 1093750. This generates a 3x1093750 matrix. I then segmented each TS into 0.5 second increments, resulting in a 1050x3125 matrix. I could technically restructure this into a 3D matrix with dimensions 3x350x3125. This gives me 350, 0.5s long "batches".

Your reshaping seems to generate many more values

Thanks for the response again. Sorry – user1147964 – 2018-02-07T12:22:50.193

Just to add, reading the first link you posted makes me think I'm reshaping things correctly. Apologies if I'm missing something obvious but here they start with a TS length 5000, and turns it into a 3D matrix with dimensions [1 25 200].

– user1147964 – 2018-02-07T12:43:08.847Compared to the method in your link, my way will create many more samples. This is because I am using a kind of 'rolling' window. Have a look at this depiction. They don't use a

– n1k31t4 – 2018-02-07T13:47:09.297rollingwindow. Making 3 mins into 350x0.5s chunks is ok (maybe not needed -how often do you predict?), each chunk should be 3x3125."I could restructure this into a 3D matrix with dimensions 3x350x3125"- this sounds better, but after making the splits I'd expect 350x3x3125 (350 chunks of 3x3125). Each of these chunks could then be processed as I described.Ah I see, you're developing a rolling window. So essentially, for every time series, you're generating 100 different lagged time series? For each time series, instead of having 1 feature per time segment, there's now 100? With each of those 100 time series being offset by a proportional amount of time samples? What is the advantage of increasing the feature count like this?

Thanks again for the reply – user1147964 – 2018-02-07T14:17:09.093

I believe we're on the same page :) The rolling window is common in time-series analysis methods, e.g. in finance. I 'speak' it like this: "If i want to predict the price of a stock tomorrow, what is relevant? Answer: the past

– n1k31t4 – 2018-02-07T14:29:14.370Tdays". This means I can make a prediction for the next timestep, N-T times. If you don't use a rolling window, you can only make N/T predictions. What's really going to bake your noodle, is the idea of stateful LSTM models; whether or not the model's state should be passed preserved between subsequent batches! :)Thanks again for the answer. I'll have a play around with time lags and see how much like I have. I'm starting to wonder if I should be using something like a CNN for this work however since I'm more interested in classifying the TS

as a whole, and comparing it to other TS. Does that sound sensible? – user1147964 – 2018-02-07T14:38:35.737Best to pose that as a separate question, listing your reasons and what you have already tried. To that end, perhaps you don't need a neural net, something simpler like KNearestNeighbour might be able to classify the entire dataset (or your chunks). A CNN maps correlations, making good use of spatial correlation. As you have time-series, a makes more intuitive sense to me. However, as you essentially want to classify the state of the system, you could look into regime switching models to see when a system state changes e.g. Markov switching models.

– n1k31t4 – 2018-02-07T15:04:21.860Thanks again for your response Dexter. I think I've got a lot of homework! – user1147964 – 2018-02-07T15:05:47.663