8

3

I am trying to do some anomaly detection between time#series using Python and sklearn (but other package suggestions are definitely welcome!).

I have a set of 10 time-series; each time-series consists of data collected from torque value of a tire (so 10 tires in total) and the sets **may not contain same number of data points (set size differ)**. Each time-series data is pretty much just the tire_id, timestamp, and the sig_value (value from the signal, or the sensor). Sample data for one time-series looks like this:

```
tire_id timestamp sig_value
tire_1 23:06.1 12.75
tire_1 23:07.5 0
tire_1 23:09.0 -10.5
```

Now I have 10 of them, and 2 of them behave strangely. I understand that this is an anomaly detection problem, but most of the articles I read online are detecting anomaly points within the same time-series (aka if at some points the torque values are not normal for that tire).

To detect which 2 tires are behaving abnormally, I tried using clustering method, basically k-means clustering (since it's unsupervised).

To prepare the data to feed into the k-means clustering, for each time-series (aka for each tire), I calculated:

- The top 3 sets of adjacent local maximum and local minimum with highest amplitude (difference)
- Mean of torque value
- Standard Deviation of torque values

I also set the number of clusters to be only 2, so either cluster 1 or 2.

So my end result (after assigning clusters) looks like following:

```
amplitude local maxima local minima sig_value_std \
tire_0 558.50 437.75 -120.75 77.538645
tire_0 532.75 433.75 -99.00 77.538645
tire_0 526.25 438.00 -88.25 77.538645
tire_1 552.50 -116.50 436.00 71.125912
tire_1 542.75 439.25 -103.50 71.125912
sig_value_average cluster
tire_0 12.816990 0
tire_0 12.816990 0
tire_0 12.816990 0
tire_1 11.588038 1
tire_1 11.588038 0
```

Now I have a question of what to do with this result ... so each tire has 3 rows of data, as I've picked the top 3 pairs of local max/min with 3 largest amplitudes, and that means each row can be assigned to a cluster, and sometimes they are assigned to different clusters for 1 tire even. Also the cluster size is normally larger than just 2.

My questions are:

- How to do anomaly detection about "set of time-series" not just individual data points?
- Is my approach reasonable/logical? If it is, how can I clean up my result to get what I want? And if not, what can I do to improve?

hi Kasra! Thank you SOOO much for trying to help! I'm trying out your method and I noticed one shortage/limitation of your method... that is that your approach is assuming that every time series data set i'm using has the same number of data points, which is not the case here... any other suggestions? :( – alwaysaskingquestions – 2018-03-04T03:34:56.330

Sure if you upvote/accept the answer if it worked. At very first stage just clip time-series. Cut them to be as the same size as the shortest time-series. If it did not help, drop another comment here. – Kasra Manshaei – 2018-03-04T03:45:43.087

Hi Kasra, I dont want to lose the data; is it possible to not cut the data? I want to use all of them. – alwaysaskingquestions – 2018-03-04T03:47:18.770

So replace the tail of your time-series with the last value. Just try it and let me know if it worked – Kasra Manshaei – 2018-03-04T03:54:46.477

doesnt that change the data basically? cuz now im adding value to the shorter data sets.... so that is changing my results right? (thank you so much for being patient with me!) – alwaysaskingquestions – 2018-03-04T03:56:06.743

It does but what can we do? Either cut the long ones or replicate short ones or "extract a set of good representative features" from every time-series. – Kasra Manshaei – 2018-03-04T13:39:49.117

@KasraManshaei May I kindly draw your attention to my question in this regard? It will be highly appreciated.

– Mario – 2020-12-03T15:42:13.180@Mario Sure my friend. As soon as I find a bit of time I will get back to your question :) – Kasra Manshaei – 2020-12-09T11:13:31.740

@Thanks for your consideration I also reformulated the question in another community if you don't mind.

– Mario – 2020-12-09T12:40:27.753