I am trying to do some anomaly detection between time#series using Python and sklearn (but other package suggestions are definitely welcome!).
I have a set of 10 time-series; each time-series consists of data collected from torque value of a tire (so 10 tires in total) and the sets may not contain same number of data points (set size differ). Each time-series data is pretty much just the tire_id, timestamp, and the sig_value (value from the signal, or the sensor). Sample data for one time-series looks like this:
tire_id timestamp sig_value tire_1 23:06.1 12.75 tire_1 23:07.5 0 tire_1 23:09.0 -10.5
Now I have 10 of them, and 2 of them behave strangely. I understand that this is an anomaly detection problem, but most of the articles I read online are detecting anomaly points within the same time-series (aka if at some points the torque values are not normal for that tire).
To detect which 2 tires are behaving abnormally, I tried using clustering method, basically k-means clustering (since it's unsupervised).
To prepare the data to feed into the k-means clustering, for each time-series (aka for each tire), I calculated:
- The top 3 sets of adjacent local maximum and local minimum with highest amplitude (difference)
- Mean of torque value
- Standard Deviation of torque values
I also set the number of clusters to be only 2, so either cluster 1 or 2.
So my end result (after assigning clusters) looks like following:
amplitude local maxima local minima sig_value_std \ tire_0 558.50 437.75 -120.75 77.538645 tire_0 532.75 433.75 -99.00 77.538645 tire_0 526.25 438.00 -88.25 77.538645 tire_1 552.50 -116.50 436.00 71.125912 tire_1 542.75 439.25 -103.50 71.125912 sig_value_average cluster tire_0 12.816990 0 tire_0 12.816990 0 tire_0 12.816990 0 tire_1 11.588038 1 tire_1 11.588038 0
Now I have a question of what to do with this result ... so each tire has 3 rows of data, as I've picked the top 3 pairs of local max/min with 3 largest amplitudes, and that means each row can be assigned to a cluster, and sometimes they are assigned to different clusters for 1 tire even. Also the cluster size is normally larger than just 2.
My questions are:
- How to do anomaly detection about "set of time-series" not just individual data points?
- Is my approach reasonable/logical? If it is, how can I clean up my result to get what I want? And if not, what can I do to improve?