## Difference between Time series clustering and Time series Segmentation

3

3

In the context of time series data mining, I have read about time series segmentation and time series clustering, but I couldn't differentiate between both. In case they are different, how these methods are correlated with each other?

Well from my understanding (please correct me if I am wrong), the segmentation is considered as a preprocessing step for the clustering phase. I mean that the segmentation step is used mainly to partition your time series data into segments, let's say into states. After that, a conventional clustering algorithm can be applied to group these segments into clusters (similar segments belong to the same cluster).

As an example, let's say that the segmentation process represents a given time series into the following segments: (S1, S2, S3, S4, S5, S6). Then after applying the segmentation process, a conventional clustering method is applied to cluster the extracted segments. So we might end up with something like this: If k = 3: then K1 {S1, S5}, K2 {S3, S6}, K3 {S2, S4}

Please correct me if I am mistaken, and provide links for more clarification if you have any.

2

Actually there is no fixed terminology and these two terms sometimes used in the same meaning and sometimes different. I would suggest following the terminology bellow for yourself, then you can differentiate methods according to this:

• Time-Series Segmentation means partitioning an individual time series to similar segments i.e. clustering within an individual time-series (e.g. i have a video in which someone is reading a book for a while, then starts walking and then starts cycling. now I want to segment these three actions).

Suggestion: State-Space reconstruction, moving Autocorrelation, moving DTW, Fourier Analysis, Visibility Graphs or any other method which can measure the similarity of a time-series with itself.

• Time-Series Clustering (or this) means finding similar time-series within a dataset of time-series (e.g. i have 10 brain signals, 5 from healthy subjects 5 from patients without knowing who is patient and who is healthy. Now I want to cluster this dataset into two clusters)

Suggestion: Build a similarity matrix between time-series using e.g. DTW and then apply Spectral Clustering (just improvised. If you search literature there should be more mature solutions)

Hope it helped :)