## How can we convert time series data to supervised learning problem?

5

1

I am preparing a data for machine learning model. I want to deal with time series data as normal supervised learning prediction. Let's say I have a data for car speed and I have several cars models such as

+-----+---------+-------------+
| day |  Model  |   Speed     |
+-----+---------+-------------+
|   1 | Bentley | 20.47 km/h  |
|   2 | Bentley | 32.22 km/h  |
|   3 | Bentley | 23.11 km/h  |
|   1 | BMW     | 37.60 km/h  |
|   2 | BMW     | 27.90 km/h  |
|   3 | BMW     | 40.47 km/h  |

so I want to deal with several model in training so that predict the speed for Bentley and BMW.

I have converted the data for training like this :

+---------+------------+------------+-------------------+
|  Model  |   day_1    |     day_2  |    label == day_3 |
+---------+------------+------------+-------------------+
| Bentley | 20.47 km/h | 32.22 km/h | 23.11 km/h        |
| BMW     | 37.60 km/h | 27.90 km/h | 40.47 km/h        |
+---------+------------+------------+-------------------+

Is it a correct approach?

Do you always have the same number of days, like 3 in your example? And I assume that your training set would have several instances with the same car model right? – Erwan – 2019-12-03T01:28:14.887

@Erwan yes always have the same days for all cars , and yes I have several other instances like mode_year, model_type like this . But I'm not sure if my above approach is correct or not ? – angela – 2019-12-03T06:26:58.253

Do you have any duplication, such as data for 2 different BMW's? Also, do you have access to other possible features, such as engine size, driver age, etc? – Donald S – 2020-06-14T05:10:10.643

1

Since you always have a fixed number of days, I think your approach is good. In order to help the learning algorithm you might consider adding some statistics as features for every instance, for example:

• mean of the last N days
• difference dayN-day(N-1) (evolution)
• ...

Of course this can work only if there is actually a dependency between the features and the predicted speed.

I have a question about difference dayN-day(N-1), so basically you mean I have to subtract day_2 speed from day_1 and then add it as feature. But what if I have 10 days , in that case I have to add 10 extra variables ? – angela – 2019-12-04T14:22:06.403

@angela Yes I would try it this way. it might or might not improve the performance, it's usually good to do a few experiments with different options for the features. – Erwan – 2019-12-04T14:24:28.067

0

0

Yes it's correct. You can do it in two ways:

1. Classical regression approach: you feed sequence [ A B C D ] to predict [ E ], or [ E F G ] in case of multistep prediction.

2. Seq2seq approach: you feed sequence [ A B C D ] to predict sequence [ B C D E ] - i.e. the same input sequence but shifted forward.

Both approaches can work. If you are working with series of small length (such as 3) I'd suggest to go for the first method.