Classification/Prediction based on Multivariate Time Series

3

7

So, I have a time series with many independent variables (X's) and an outcome variable Y (that I want to predict, think a 2 class logistic regression where output would either be 1 or a 0). Kindly see a sample below:

Timestamp       X1      X2      X3          X4          Y
1:00            1       0.5     23.5        0           0
1:01            1       0.8     18.7        0           0
1:02            0       0.9     4.5         1           0
….
1:30            1       1.9     5.5         1           1
1:31            0       1.7     4.3         0           1
…
…


Now I want to predict or rather classify Y as 0 (stable) or 1 (unstable) (Note that when Y becomes 1 it remains 1 for certain interval of time, same when it is 0)

So Y will be dependent on sequence variables (Please note that it is a time series, and not a standard regression problem where every row can be fed to an Algorithm for classification, the output here is dependent on a sequence of inputs/rows), for instance Y may become 1 when X2 starts increasing and X3 starts decreasing and so on (there are many independent variables X1…XN).

The way I was thinking in order to solve this problem was to extract say m hours of data before Y becomes 1 and do some descriptive statistics on X in order to derive new features (like mean of X1, std of X2, last change point of X4 and so on for the set of this extracted data) to convert the X’s to a single row feature vector. The outcome ‘Y’ of this single row feature vector is 1 as we have just extracted the data before Y became 1. So this way I am able to convert a time series into a standard classification/prediction problem. Similarly I can take the other class i.e. Y=0 and follow the same process.

The other approach that I thought about was to incorporate a sequence model, something like Hidden Markov Model where the hidden states might be stable (say for Y=0) and unstable (for Y=1) and then I go about emission and transition probabilities. But this HMM will be multivariate considering there are many X’s on which Y is dependent. This seems a bit complex?

Any ideas on modeling the above problem will be appreciated.

Are you trying to predict binary outcomes for every second? If not this post might be misleading. – user61762 – 2018-10-31T10:16:22.907

6

Train an LSTM-RNN to perform direct sequence classification. This essentially means that it will have multiple inputs and 1 output, i.e. the label (0 or 1). In Keras/Python this is very easy to implement, just make sure that you have a Dense layer in the end with sigmoid activation so that the output is between 0 and 1. You train the network based on your labeled data and then it outputs the label by itself. A useful tutorial on how to do this can be found here. The most important thing is that it inherently deals with linear/nonlinear cross-correlation between inputs so you don't have to explore them yourself. It is also capable of learning the dynamics of the input signals, because of its inherent memory.

Keep in mind that in overall it is a very convenient solution because it works like a black box that accepts time series and "spits" out their labels.

This approach has worked successfully for me for time series classification :)

1This is perfect @pcko1. I have already started getting my hands dirty :) Beautiful Answer. Thanks a ton! – Yavar – 2018-06-19T10:22:30.530

Hi @pcko1 Your answer is useful to me. I have similar problem. My problem is not time dependent. My problem is more towards PATTERN CLASSIFICATION AND ITS NOT TIME DEPENDENT. Here is the link. Could you throw some light? https://datascience.stackexchange.com/questions/71885/which-ml-algorithm-should-i-use-for-following-use-case-for-classification-and-wh

– Mitesh Patel – 2020-04-08T05:27:25.867