I keep reading articles about time series forecasting.
They all start from the same assumption: time series forecasting can't be treated as a regression/classification problem. It is time dependent, which means our target y at time t depends on what the value y was at t-1.
The motivation often comes with some example data showing some trend/seasonality. Also, other supporting points are:
- The data distribution (mean, variance) changes over time.
- Traditional train/test splits won't make sense since what is the point of forecasting January data with September data?
Fair enough. But let me try to point out this example. Let's say we have a simple timestamp, variable dataset and we are trying to predict the value at t+1
| timestamp | value |
| 01/01/2019| 10 |
| 01/02/2019| 12 |
| 31/12/2019| ??? |
What we know is that there is no trend it's very weekly-cyclic instead, which means the value at t will probably depend on its value at t-7 days. We also know that depending on whether it's a day during the week or during the weekend, data will change accordingly.
What prevents me to use some basic feature engineering and transform the example data as follow?
| timestamp | value_at_t_minus_7 | day_of_week | value |
| 01/01/2019| 11 | 02 | 10 |
| 01/02/2019| 12 | 03 | 12 |
| 31/12/2019| 10 | 02 | ??? |
It is not time dependant from a formal perspective but the correlation between its lagged values and the information about the day of the week should take me where I want to, being able to use now classic and flexible methods such as Random Forest, XGB and also splitting train and test (of course leaving the holdout set for validation) to get a good sense of how my model is performing.
Could anyone offer their input supporting with some proper motivation?