## Relation mining of multivariant categorical timeseries without excluding the temporal nature

5

To all:

I have been wracking my brain at this for a while and thought maybe someone here would know of a package or algorithm to handle the following:

I have nominal multivariant timeseries that look like the following:

          Time Var1 Var2 Var3 Var4 Var5 ... VarN
0     A     A   B    C    A   ... H
1     A     A   B    D    D   ... H
2     B     A   C    D    D   ... H
..


And so on from times 0 to 1,000,000. What I would like to do is search the time series for rules of the type:

Given Var3 is in state B in the previous step and Var5 is in state D in the previous step, than Var1 will be in state B. What I want to do is have the rules that include the time interval explicitly. A simpler case of interest would simply be to reduce the time series to

               Time    Var1 Var2 Var3 Var4 Var5 ... VarN
0        0    0    0     0   0   ... 0
1        0    0    0     1   1   ... 0
2        1    0    1     0   0   ... 0


Where the the variable is 1 if its state is different from the previous step and zero otherwise. Then I just want to have rules that say something like:

If Var4 and Var5 changed in the previous step than Var1 will change in the current step. Which would be easy for a lag of one, as I could just make the data into something like:

   Var1 Var2 Var3 Var4 Var5 ... VarN Var1_t-1 Var2_t-1 Var3_t-1 ...


and then do sequence mining, but if I want to have rules that aren't just a single lag but could be lags from 1 to 500 than my data set begins to be a little difficult to work with.

Any help would be greatly appreciated.

Edit to respond to comment: Each column could be in one of 7 different states. As far as a target, it is non-specific, any rules between the columns would be of interest. However, predicting columns 30-40 and 62-75 would be particularly interesting.

1+1 Good question and well explained. I don't think there will be a straightforward answer to this. Here are a few follow up questions. What is the cardinality of each of the columns? Also, what is the target column here? Are you interested in predicting Var1? – Nitesh – 2014-11-21T18:38:30.060

1I could see using a well-indexed MySQL database to do this. – None – 2014-11-22T19:30:43.100