Can we predict when an event will occur in the future from time series data?


I would like to predict a few possible times when a particular event may occur. For instance, I have the daily activity data of a person that consists of what the person doing and when over a period of time. Currently, my data are like the following format:

Date Time Activity

01-01-2017 08:23:30 Breakfast

01-01-2017 16:20:30 Reading

01-01-2017 19:00:00 Dinner

02-01-2017 08:00:10 Breakfast

02-01-2017 17:40:30 Reading

02-01-2017 19:30:00 Dinner

03-01-2017 08:15:30 Breakfast

03-01-2017 16:20:30 Reading

03-01-2017 20:30:00 Dinner

Let's just say, in most of the cases, the person takes his dinner around 7-7:30 pm and sometimes at 8-8:30 pm, read books around 4 pm mostly and sometimes at 5,6 pm too. From this type of data, I would like to predict when he will take breakfast, dinner or read tomorrow. For instance, dinner at 8pm (confidence/probability 65%), or 8:30pm (20%), 9 pm (10%), 9:30pm (5%).

The challenge I am facing is, I can't identify which technique to choose to achieve the goal. Can you please provide me a few hints?

Shamsur Rahim Hemel

Posted 2018-12-19T01:32:05.513

Reputation: 41



I would suggest creating a custom ontology by capturing the ranges of each activity. For example, define in your training data additional parameter for breakfast and put the range of between the minimum and the maximum hour 08:00:10 08:23:30.

Feature engineering is a must if you plan to you RandomForest, LinearRegression, etc..

Additionally, the timestamp itself has very reach data information. You can take the day of the week, the day of the month, the day of the year, week of the year. Based on this you can see whether is the weekend, a weekday(mon-fri), and you can see how this will change over time(maybe during the summer the person is eating later/earlier). Also, you can see the hourly changes. I would also strongly suggest using plots, and this way you will see how the "Breakfast" label is chaining over time(during different periods of the year, different parts of the week/month). This will give you a better understanding of what is happening in your data.

In the end, I would suggest trying RandomForest(s)(maybe a bagging approach). They are really good for time-series data, especially when you can define nice features where the differences are clearly visible. Tunning the RandomForest is a must. In similar problems to this one, the min_sample_split and the max_depth parameter are usually really helpful.

Additionally, Ridge or Lasso regression is also a good way to try your newly constructed features.

At the end, as it was pointed out, LSTM is proven to capture time-dependency in the data. However, I would reconsider using it, without any practical experience.


Posted 2018-12-19T01:32:05.513

Reputation: 151


Yes you can. Try out LSTM. You input a time sequence and get a new time sequence back conditioned on the input data.

Andreas Look

Posted 2018-12-19T01:32:05.513

Reputation: 863