Time series forecasting using multiple time series as training data



I am trying to forecast the total attendance (ie. the number of entrances, which is also the number of tickets bought) of a festival just two days after it started. That is, knowing how many people went to the event during the first two days, how to predict the total number of people that will have visited the festival?

I do know it seems difficult at first glance, and that I theoretically could only make pretty bad forecasts, but here's the deal : I have organized more than thirty festivals in the past, and have collected data on each of them. Specifically, on each of these festivals, I have a daily time series where I know:

  • the number of tickets bought daily
  • whether the day was a weekday or during the week-end
  • whether the day was a day of public and school holidays
  • the daily weather

I have observed that all of these time series follow multiple trends. For instance, attendance always is at its best on saturdays, and at its worse on tuesdays... Likewise, there seems to always be more people coming in the very last days of the event than in beginning. These trends are the same for almost all the festivals. When decomposing the time series, I observe close trends, and close seasonal values.

Another thing, which I guess is no good news, is that the events had different timespans : some lasted 4 days, others 5, 6, 7 and even 8 days. Some started on a monday, others on a saturday.

So my questions is: how could I use these time series as a training data to try to forecast total attendance at the event knowing the attendance of the very first days. That is to say, which model could I use to predict total attendance of the event knowing I have all of this data ? I was of course thinking of machine-learning (or deep-learning) since I have a lot of training data, but I'm unsure whether it can be easily implemented in R or Python...

In order to do forecasts, I do of course know, for the on-going festival, how long it will last, whether it is going to take place during public and school holidays or not, whether it is going to span over a week-end, and I have the weather forecasts for each day.

The Half-Blood Prince

Posted 2018-07-09T15:07:29.503

Reputation: 153



First cluster the events that have the most similarities. Then use a comparable (or more than one of then ) to forecast the sales of the new events that you do not have historical data. Use all other information you have as regressor. Here is a code to do the forecast in R. You will be able to combine different forecasting models with this code:


  #train data

  x_train <- window(x, end = end_train )

  x_test <- window(x, start = start_test)

  #train and test for regressors

  reg_train <- window(reg, end = end_train )

  reg_test <- window(reg, start = start_test) 



  stlf(x_train , method="arima",s.window= nrow(x_train),xreg = reg_train, newxreg = reg_test, h=h1)-> fc_stlf_xreg

  auto.arima(x_train, stepwise = FALSE, approximation = FALSE,xreg=reg_train)%>%forecast(h=h1,xreg=reg_test) -> fc_arima_xreg

  set.seed(12345)#for nnetar model
  nnetar(x_train, MaxNWts=nrow(x), xreg=reg_train)%>%forecast(h=h1, xreg=reg_test) -> fc_nnetar_xreg

  stlf(x_train , method= "ets",s.window= 12, h=h1)-> fc_stlf_ets


  mod1 <- lm(x_test ~ 0 + fc_stlf_xreg$mean + fc_arima_xreg$mean + fc_nnetar_xreg$mean + fc_stlf_ets$mean)
  mod2 <- lm(x_test/I(sum(coef(mod1))) ~ 0 + fc_stlf_xreg$mean + fc_arima_xreg$mean + fc_nnetar_xreg$mean + fc_stlf_ets$mean)


  stlf(x, method="arima",s.window= 12,xreg=reg, newxreg=new_reg, h=h)-> fc_stlf

  auto.arima(x, stepwise = FALSE, approximation = FALSE,xreg=reg)%>%forecast(h=h,xreg=new_reg) -> fc_arima

  set.seed(12345)#for nnetar model
  nnetar(x, MaxNWts=nrow(x), xreg=reg)%>%forecast(h=h, xreg=new_reg) -> fc_nnetar

  stlf(x , method= "ets",s.window= 12, h=h)-> fc_stlf_e


  Combi <- (mod2$coefficients[[1]]*fc_stlf$mean + mod2$coefficients[[2]]*fc_arima$mean +
              mod2$coefficients[[3]]*fc_nnetar$mean + mod2$coefficients[[4]]*fc_stlf_e$mean)



Posted 2018-07-09T15:07:29.503

Reputation: 116