Time series feature extraction from raw sensor data for classification?



I have a tabular raw data from sensors with associated label and i want to extract the time series features like mean,max,min and std from the data all the sensor data and form another table or export to csv file so that i can do classification task on that data.

Data table

enter image description here


Posted 2018-05-28T20:38:42.057

Reputation: 143

1Could you be more precise problem as what you problem is and what tools/ languages you are planning to use? – El Burro – 2018-05-29T11:40:32.793

I am using python with data table like data1_mean,data1_max,data1_min etc also how to transform the label. – rosy – 2018-05-29T19:07:26.870

you can look at the tsfresh repository on github. It extract time series features from the sensor logs – Fahad Ali Sarwar – 2019-02-23T23:49:06.907



For clarification: mean,max,min,std are not "time series features", they are data features in general.

Assuming that you want to do it in python, you should take a look at pandas.DataFrame class. Once you initialize a Dataframe object with your tabular data, you can call its methods DataFrame.min(), DataFrame.max(), DataFrame.mean(), DataFrame.std() for your purpose.

You can insert all these calculated characteristics into a new DataFrame and thereafter call Dataframe.to_csv() to export them in a csv file.


Posted 2018-05-28T20:38:42.057

Reputation: 3 275

Thanks but i want a rolling window of 10 seconds and with 50 % data overlapping also what about the label. – rosy – 2018-05-29T19:05:48.663

This was not in the original question. Anyway, you can always create a for-loop going through all samples in batches of 10s with 50% overlapping, then assign the batch in a DataFrame object (inside the loop) and then call the appropriate methods for the batch. It will work smoothly. – pcko1 – 2018-05-29T19:17:35.600

also check this https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html :)

– pcko1 – 2018-05-29T19:37:14.050

Thanks but if i use loop across the dataset then what about the label corresponding to that,should i use the label with highest frequency on that window. – rosy – 2018-05-29T19:41:13.727

Since the label is binary, you cannot "average" it into a real number because it would not make sense. Your best choice is to use the "majority vote" of the window, which is the label with the highest frequency as you mentioned. – pcko1 – 2018-05-29T19:44:10.747


Perhaps you need to look at this self-contained blogpost on Machine Learning with Signal Processing Techniques on how to prepare your time series data and extract useful statistical estimate and feature for machine learning models. At the end an example is given for classification. I found it super useful and straightforward.

Somewhere in the middle of the post, this great method for the Detection of peaks in data is introduced as well.


Posted 2018-05-28T20:38:42.057

Reputation: 3 728


You can also use an open source python library called 'tsfresh' (https://tsfresh.readthedocs.io/en/latest/) to extract time series features


Posted 2018-05-28T20:38:42.057

Reputation: 21


I don't have enough reputation to leave a comment, but could you please provide some sample data so that we can help you better?

When you say mean, max, min, are you trying to aggregate multiple rows of data on a date column with these functions? Or, do you have a timespan/ datetime/ timestamp column that you want to use?

The Lyrist

Posted 2018-05-28T20:38:42.057

Reputation: 454