I have devices on which I have time series data of one continuous variable. I have to evaluate the relation between the profile of that variable on those devices and "events".
Those events are given in terms of occurrences on a time period.
My first intention is to make clusters of similar behavior of that variable and compare those clusters with the low/middle/high events rates.
I was thinking about doing a K-means with the min, max, quartilles, mean, normal q-q p value, Kurstosis, etc. as dimensions, but I don't think it's a good idea because:
- Those dimensions are not independant
- It's "losing" data and so potentially losing classification potential
Do you have some suggestions to group similar devices together?
Also, do you have other ideas to establish that relationship?
- python3 with the scipy stack
- ~ 3000 devices and hundreds of thousands of data per day; 5 months to consider