I have some very complicated data about some movie sales online, first for each data entry, I have a key which is a combination of five keys, which are territory, day, etc, and then, for each key I have the sales for a period of time, and other information, like the movie's box office and genre.

For each day, there is a delay for the data loading to the database, around ten hours, I try to fill the gap, do some data extrapolations.

For each movie we sell, there is some decay of selling since the new release of the movie, i.e. usually for each movie, it follows some sales decay pattern.

For a recent day, I pulled some data, and I found that some decay pattern:

decay curve 1

decay curve 2

decay curve 3

And for that day, the sales for each key can range from around $150000 to $0. The pic is as follow:

one day sales curve

In the picture, the 15000 means there are around 15000 keys for each day.

Found this article.

I am trying to predict for each key, the sales amount, like for a movie, territory, day etc combination, the sales amount, how much dollars, means for that movie, that territory, that day, how much money we get from selling online. I tried ARIMA time series model, but there is some concerns for that model, seen from the pics, there is some seasonal thing, and decay thing for the movie, so the sales prediction can not be always flat, there may be a pump after a going down, it may happens on a weekend, since there is seasonal thing, and the decay trend, etc, how to capture these things. Thank you for your reply!

I am not sure whether can be applied, and how to be applied here.

Thanks a lot in advance.


1I think you should clarify your question. What are you trying to predict exactly, what are your constraints, what have you tried, what are your concerns about the approach you cite? – Sean Owen – 2014-10-12T10:19:04.723

I edited the problem, add some clarification. Thank you! – user3634601 – 2014-10-12T21:55:40.043



I am not sure I understood the problem,however if you are trying to predict sales amount my guess is ARIMA might not be the right choice as it will not consider external variables.My suggestion is to gather related features such as how did the other movie from same genre did at that time of the year in that region,weather,star presence,presence of other popular matches or tournament at that time and etc.Let me know your thoughts.


