Learning with groups of sequential data


Say I have a data set such as the following:

person, Time, Value, Event
person1, 2010-07-02 00:00:00, 5.4, 0
person2, 2010-07-02 10:00:00, 12.7, 0

We have a current model in place at work that doesn't take into account the temporal aspect of our data. In that implementation, the model was trained with only unique values for 'person', and it throws away the time variable. However, it has come to our attention that we can look at our data as a sequence instead. This starting time is unique for each person, and clearly associated with only that person, so merely pretending each person is independent and just treating each row as an individual data point wouldnt make any sense. The following is what I've restructured the data as:

person, Time, Value, Event
person1, 2010-07-02 00:00:00, 5.4, 0
person1, 2010-07-02 00:00:15, 3.6, 0
person1, 2010-07-02 00:00:30, 2.4, 0
person2, 2010-07-02 10:00:00, 12.7, 0
person2, 2010-07-02 10:01:15, 12.8, 0
person2, 2010-07-02 10:01:30, 13.1, 1

This sequence for each person would continue until and 'event' or 'non-event'. I'm totally unfamiliar with machine learning on time series data. All of the examples I've read with different models treat the data as one big sequence corresponding to one entity, while our data clearly doesn't work like that. Is the way I've structured the data the right way to approach a time series model? And if so, what would be an appropriate model to consider?

Boudewijn Aasman

Posted 2017-02-06T01:57:55.603

Reputation: 133

Does every single person have the same time period? For example each person has exactly 3 different Time values from 2010-07-20 00:00:00 to 2010-07-02 00:00:30. – Icyblade – 2017-02-06T03:37:45.497

No, that is just coincidence from when I was creating the fake data too quickly. – Boudewijn Aasman – 2017-02-06T14:36:58.030



The data that you are showing is typical survival data. If you want to model the event depending on time and value you should look into survival models with time dependent covariates. If you want to do this without any assumptions on the distribution you can start with a Kaplan Meier estimate as explained here. If you want to use parametric models you can look at Weibull or Gamma regression.

If you are new to this topic I can highly recommend to browse the examples of the packages in the CRAN Survival Task View.


Posted 2017-02-06T01:57:55.603

Reputation: 1 333

That makes total sense. One thing though, the outcome in the training dataset would always be known. Does that change anything? – Boudewijn Aasman – 2017-02-07T14:00:36.720

No, if I understand you correctly you face a supervised learning problem. Survival analysis is mostly, if not completely, supervised. Feel free to vote for my answer if it helped. – Stereo – 2017-02-09T03:40:35.797