24

29

### Background

I'm working on a time series data set of energy meter readings. The length of the series varies by meter - for some I have several years, others only a few months, etc. Many display significant seasonality, and often multiple layers - within the day, week, or year.

One of the things I've been working on is clustering of these time series. My work is academic for the moment, and while I'm doing other analysis of the data as well, I have a specific goal to carry out some clustering.

I did some initial work where I calculated various features (percentage used on weekends vs. weekday, percentage used in different time blocks, etc.). I then moved on to looking at using Dynamic Time Warping (DTW) to obtain the distance between different series, and clustering based on the difference values, and I've found several papers related to this.

### Question

Will the seasonality in a specific series changing cause my clustering to be incorrect? And if so, how do I deal with it?

My concern is that the distances obtained by DTW could be misleading in the cases where the pattern in a time series has changed. This could lead to incorrect clustering.

In case the above is unclear, consider these examples:

### Example 1

A meter has low readings from midnight until 8AM, the readings then increase sharply for the next hour and stay high from 9AM until 5PM, then decrease sharply over the next hour and then stay low from 6PM until midnight. The meter continues this pattern consistently every day for several months, but then changes to a pattern where readings simply stay at a consistent level throughout the day.

### Example 2

A meter shows approximately the same amount of energy being consumed each month. After several years, it changes to a pattern where energy usage is higher during the summer months before returning to the usual amount.

### Possible Directions

- I've wondered whether I can continue to compare whole time series, but split them and consider them as a separate series if the pattern changes considerably. However, to do this I'd need to be able to detect such changes. Also, I just don't know if this is a suitable way or working with the data.
- I've also considered splitting the data and considering it as many separate time series. For instance, I could consider every day/meter combination as a separate series. However, I'd then need to do similarly if I wanted to consider the weekly/monthly/yearly patterns. I
*think*this would work, but it's potentially quite onerous and I'd hate to go down this path if there's a better way that I'm missing.

### Further Notes

These are things that have come up in comments, or things I've thought of due to comments, which might be relevant. I'm putting them here so people don't have to read through everything to get relevant information.

- I'm working in Python, but have rpy for those places where R is more suitable. I'm not necessarily looking for a Python answer though - if someone has a practical answer of what should be done I'm happy to figure out implementation details myself.
- I have a lot of working "rough draft" code - I've done some DTW runs, I've done a couple of different types of clustering, etc. I think I largely understand the direction I'm taking, and what I'm really looking for is related to how I process my data before finding distances, running clustering, etc. Given this, I suspect the answer would be the same whether the distances between series are calculated via DTW or a simpler Euclidean Distance (ED).
- I have found these papers especially informative on time series and DTW and they may be helpful if some background is needed to the topic area: http://www.cs.ucr.edu/~eamonn/selected_publications.htm

Jo did you find the right answer for your question? I am in the same situation and I need help. Thank you – LSola – 2017-04-10T16:12:59.903

+1 Very nice question, and it is great to see so much enthusiasm! I think you could nail down your question a little bit, so it's more inviting for others to read, and then give you an answer. – Rubens – 2014-12-22T09:56:32.873

@Rubens Thanks! I'll re-work it when I'm home this evening, I can see where it'd be useful to include some more information about how I've gotten to this point and why. I was worried about it getting too long, but I'll separate out the background and question a bit more to avoid it getting unreadable. – Jo Douglass – 2014-12-22T10:04:44.587

It may not be a "pure statistics" question but it needs a pure statistics answer. You will struggle until you can think about it in pure statistics terms. – Spacedman – 2014-12-26T11:02:20.280

@Spacedman - I welcome answers in whatever manner people feel is the best way to answer it, with the caveat that I may have further questions if the answer is heavy on formulas or references to statistical concepts that I don't understand yet. – Jo Douglass – 2014-12-27T13:55:40.887