Feature Extraction Technique - Summarizing a Sequence of Data



I often am building a model (classification or regression) where I have some predictor variables that are sequences and I have been trying to find technique recommendations for summarizing them in the best way possible for inclusion as predictors in the model.

As a concrete example, say a model is being built to predict if a customer will leave the company in the next 90 days (anytime between t and t+90; thus a binary outcome). One of the predictors available is the level of the customers financial balance for periods t_0 to t-1. Maybe this represents monthly observations for the prior 12 months (i.e. 12 measurements).

I am looking for ways to construct features from this series. I use descriptives of each customers series such as the mean, high, low, std dev., fit a OLS regression to get the trend. Are their other methods of calculating features? Other measures of change or volatility?


As mentioned in a response below, I also considered (but forgot to add here) using Dynamic Time Warping (DTW) and then hierarchical clustering on the resulting distance matrix - creating some number of clusters and then using the cluster membership as a feature. Scoring test data would likely have to follow a process where the DTW was done on new cases and the cluster centroids - matching the new data series to their closest centroids...


Posted 2014-06-23T23:20:36.180

Reputation: 692



would LOVE to see a box written which collected case studies on feature engineering / extraction

Please advise if this helps

  1. Discretization of Time Series Data http://arxiv.org/ftp/q-bio/papers/0505/0505028.pdf

  2. Optimizing Time Series Discretization for Knowledge Discovery https://www.uni-marburg.de/fb12/datenbionik/pdf/pubs/2005/moerchen05optimizing

  3. Experiencing SAX: a Novel Symbolic Representation of Time Series http://cs.gmu.edu/~jessica/SAX_DAMI_preprint.pdf

  4. Indexing for Interactive Exploration of Big Data Series http://acs.ict.ac.cn/storage/slides/Indexing_for_Interactive_Exploration_of_Big_Data_Series.pdf

  5. Generalized Feature Extraction for Structural Pattern Recognition in Time-series Data http://www.semanticscholar.org/paper/Generalized-Feature-Extraction-for-Structural-Olszewski-Maxion/7838bcd87bb6616e9fd3ffd92d4676a7082da34c

  6. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package https://cran.r-project.org/web/packages/dtw/vignettes/dtw.pdf


Posted 2014-06-23T23:20:36.180

Reputation: 180


What you're trying to do here is reduce the dimensionality of your features. You can search for dimensionality reduction to get several options, but one very popular technique is principal components analysis (PCA). Principal components are not interpretable like the options you've mentioned, but they do a good job of summarizing all of the information.


Posted 2014-06-23T23:20:36.180

Reputation: 1 086

If the t and t+1 dependency is a trend or seasonality - consider extracting it and dealing with the rest as with independent variables. – Diego – 2016-03-25T17:20:21.107

My concern with this answer is that PCA doesn't recognize the clear dependency between the series t and t+1. – B_Miner – 2014-06-24T01:50:57.443


Feature extraction is always a challenge and the less addressed topic in literature, since it's widely application dependant.

Some ideas you can try:

  • Raw data, measured day-by-day. That's kind of obvious with some implications and extra preprocessing (normalisation) in order to make timelines of different length comparable.
  • Higher moments: skewness, kurtosis, etc
  • Derivative(s): speed of evolution
  • Time span is not that large but maybe it is worth trying some time series analysis features like for example autocorrelation.
  • Some customised features like breaking timeline in weeks and measure the quantities you already measure in each week separately. Then a non-linear classifier would be able to combine e.g first-week features with last-week features in order to get insight of evolution in time.


Posted 2014-06-23T23:20:36.180

Reputation: 599

Nice suggestions! Can you flesh out the use of derivatives more? – B_Miner – 2014-06-24T14:04:44.013

I agree completely with your first statement. I would LOVE to see a box written which collected case studies on feature engineering / extraction. The adage is that feature creation is much more important than the latest greatest algorithm in predictive model performance. – B_Miner – 2014-06-24T14:07:06.180


At first glance, you need to extract features from your time series (x - 12) - x. One possible approach is to compute summary metrics: average, dispersion, etc. But doing so, you will loose all time-series related information. But data, extracted from curve shape may be quite useful. I recommend you to look through this article, where authors propose algorithm for time series clustering. Hope, it will be useful. Additionally to such clustering you can add summary statistics to your feature list.


Posted 2014-06-23T23:20:36.180

Reputation: 1 089

Thanks for the link. I had also considered using DTW and hierachical clustering. I have experimented with the R package for DWT. http://www.jstatsoft.org/v31/i07/paper

– B_Miner – 2014-06-24T13:58:49.467

1I considered specifically creating n clusters and using the clustering membership as a feature. – B_Miner – 2014-06-24T13:59:40.057