Multivariate Time Series Binary Classification

6

2

I have continuous (time series) data. This data is multivariate. Each feature can be represented as time series (they are all calculated on a daily basis). Here is an example.

Days    F1  F2  F3  F4  F5  Target
Day 1   10  1   0.1 100 -10 1
Day 2   20  2   0.2 200 -20 1
Day 3   30  3   0.3 300 -30 0
Day 4   40  4   0.4 400 -40 1
Day 5   50  5   0.5 500 -50 1
Day 6   60  6   0.6 600 -60 1
Day 7   70  7   0.7 700 -70 0
Day 8   80  8   0.8 800 -80 0

F1, F2, .. F5 are my features and Target is my binary classes. If I use a window size of 3, I can convert my features into time-series data. Then, I will have [10,20,30] for feat_1, [1,2,3] for feat_2 and so on. With the window size of 3, I have 5 feats * 3 window_size, a total of 15 features if written in the same vector.

The problem with this method is putting them into the same vector might cause some problems since the feature values are different

Example of multivariate time series (15 features in 1 network):

[10, 20, 30, 1, 2, 3, 0.1, 0.2, 0.3, 100, 200, 300, -10, -20, -30]
[20, 30, 40, 2, 3, 4, 0.2, 0.3, 0.4, 200, 300, 400, -20, -30, -40]
....
[60, 70, 80, 6, 7, 8, 0.6, 0.7, 0.8, 600, 700, 800, -60, -70, -80]

The other option is to create separate time series network (RNNs mostly, LSTM or CNN or their combination) for each of the features with the same target and then combine their results. In this scenario, I have 5 different networks and all of them are univariate time series binary prediction.

Example of different networks with univariate time series data (3 features in 5 networks):

[10, 20, 30]
...                            This is for network 1
[60, 70, 80]

[1, 2, 3]
...                            This is for network 2
[6, 7, 8]

...

[-10, -20, -30]
...                            This is for network 5
[-60, -70, -80] 

The problem with this one is, I might lose information of the feature correlation even though I'm putting their results into another network.

My question is, which is the best way to use when dealing with multivariate time series problems? I want to use the first method but value differences worry me. Second method is easier but I worry that I might be losing some essential information.

iso_9001_

Posted 2019-02-16T12:12:19.497

Reputation: 163

Answers

3

You can add all features as input to RNN/LSTM (Day #, F1, F2, ... F5) and binary class as output.

This article has an example of such network.

Shamit Verma

Posted 2019-02-16T12:12:19.497

Reputation: 2 086

Thanks. I have his book about time series forecasting and aware of this link. But what about my main problem with this approach? I mean how will the value difference among features affect the system? – iso_9001_ – 2019-02-16T14:30:02.057

This should not matter since values for the same feature should be in the same scale (E.g.: all values for F3 seem to be between 0 to 1). – Shamit Verma – 2019-02-16T14:34:54.927

Also, most libraries have option of normalizing values. You can try training with and without normalizing so that difference in scale of F3 and F4 does not slow down training. – Shamit Verma – 2019-02-16T14:36:14.693

OK, got it. I should give it a try. What about method 2? Isn't it better to separate features so that valur differences won't have an effect on the system? – iso_9001_ – 2019-02-16T14:59:45.030

Recommendation is not not use either Method 1 or 2. Just provide all features and (and last N samples in sequence) to model. With method 2, model will not be able to learn from features such as F2/F5 or F2^2/F1+F5 – Shamit Verma – 2019-02-16T15:20:02.057

Thanks. I'll gor for option 1. My last question is, how will the network know that I'm using a multivariate system rather than univariate. I mean, what's the difference between a) single feature with rolling window size of 15 (which makes a vector of 15 created by a single feature) and b) multi features of 5 with rolling window size of 3 (which also makes a vector of 15 by 5 features)? – iso_9001_ – 2019-02-16T15:25:09.803

Do not use option 1 if that option means providing 15 features. You will provide 5 features, but last N samples along with each sample. Network will learn to ignore previous samples (and columns of those samples) that are not relevant. – Shamit Verma – 2019-02-16T15:55:03.923

I'm not sure I understood what you mean by "N samples along with each sample". Do you mean batch size? – iso_9001_ – 2019-02-16T19:16:48.263

No, each sample will be an array (current + previous samples). Typically, each sample is array of size M (M is number of Features). With RNN, each sample is an array of size N * M. Each batch is an array of Batch_Size, N, M. https://github.com/keras-team/keras/blob/master/examples/conv_lstm.py is a good example.

– Shamit Verma – 2019-02-17T10:36:38.233