6

3

I am interested in clustering multivariate N time series of T'values' each(different lengths) using python. Each variable have many trends and values which are simultaneously numeric and nominal.

A sample $T_{i}$ in the dataset has the following format:

```
TimeStamp | Sensor0 | Sensor1| Sensor2
2015-02-05 11:30|<Min | On | off
2015-02-05 11:31|<Min | on | off
2015-02-05 11:32| Action2 | 10 | 0.0001
2015-02-07 11:33| Action2 | 10 | 0.00012
2015-02-07 11:34| Action2 | 10 | 0.00012
2015-02-07 11:35| Action2 | 20 | 0.00015
```

Another sample $T_{j}$ in the dataset has the following format:

```
TimeStamp | Sensor0 | Sensor1| Sensor2
2015-10-05 11:30| Action2 | 11 | off
2015-10-05 11:31| Action1 | 11 | off
2015-10-05 11:32| Action2 | NAN | 0.0001
2015-10-07 11:33| Action3 | NAN | 0.00012
2015-10-07 11:34| <Min | 10 | 0.00012
2015-10-07 11:35| <Min | 15 | on
```

For the missing values (not numeric), they were not collected by the sensors so my idea was to replace them by minimum values., given that all values are strictly positive. Otherwise, they would be considered as missing values. In which case the problem would be of finding a similiraty measure that can compare missing values (off,on..) and numeric values.

I am wondering if there is a similarity / distance measure already exist in the litterature to compare such multivariate timeseries, with hetergonuos lengths, and whether this kind of problem has already been formulated in the papers, books or else for R and python.

Thanks for your advice.

Fit the time series to a model, and cluster the model parameters. – Emre – 2016-08-16T19:45:10.313

@ emre thank you for your response. I just can't seem to find a way of finding the right modeling framework for such context. So you have a specific method in mind? – user23440 – 2016-08-16T21:48:50.593

I'd use a neural network.

– Emre – 2016-08-16T21:51:15.613That's indeed a very good paper. I am a newbie to deep learning. It will be difficult for me to implement it in python. Do you know of good resources for a beginner(books moods github..)? – user23440 – 2016-08-16T22:24:44.630

Start here, then try these time series tutorials.

– Emre – 2016-08-16T23:03:46.060Thank you so much for your advices. So once the model parameters are learned it's possible to cluster using similarity measure like Euclidean distance? Is it still a valid metric for such features representation? – user23440 – 2016-08-16T23:48:31.010

That's where things get hairy, because clustering is subjective, and rescaling the features will change the clusters with a metric like the Euclidean distance. I suggest just trying the various clustering algorithms and looking into metric learning. But finding this embedding (time series representation) to make clustering feasible in the first place is the hard part, so don't fret!

– Emre – 2016-08-17T17:44:46.833