Deseasonalize the data of traffic growth for the past data

5

2

In another question (Using TimeSeriesForecast for forecasting the traffic growth) I asked to use the TimeSeriesForecast on this set of data the answer provided the plot with and without the seasonalization for the future value.

Is it possible to get the same plot (without seasonalization) for data which are already available (now this is only for the forecasted one)?

EDIT

The SARIMA model has the seasonal peaks. Is it possible to calculate the same model also for the range of dates on which I provided the data?

If that is possible maybe I could use the model of the periodic peaks to modulate the real data getting rid of the data seasonality.

What I would be curious to get is not a regression (which would be a flat line) but the deseasonalized data.

enter image description here enter image description here

Revious

Posted 2015-03-26T11:33:48.137

Reputation: 376

What is stagionalization? – C. E. – 2015-03-26T12:05:57.623

@Pickett: sorry, my english is terrible. I meant seasonalization, I will correct. Thanks for telling me. – Revious – 2015-03-26T12:28:00.907

1

I'm not sure what you mean with deseasonalized without making a regression. Do you mean filtering or smoothing? Or something like this, this or this, which are all different kinds of regressions?

– Karsten 7. – 2015-03-26T12:38:19.750

@Karsten7. I am not sure if what I want can be achieved. I will edit the question. ps: how did you managed to get the 3 images? – Revious – 2015-03-26T13:44:55.683

1The last one was created using something like what I show in my answer. The other two were produced by fitting a function, that is not a straight line, to the data (, but this is still doing a linear regression). – Karsten 7. – 2015-03-26T15:31:25.137

Answers

3

Removing seasonality by using MeanFilter:

data = MapAt[DateString[{#, {"Day", "/", "Month", "/", "Year"}}] &,
 Import["D:\\Analytics www.superinformati.com Panoramica del pubblico 20141201-20150303 - Sheet 1.tsv"][[3 ;;, {1, 2}]]
 , {All, 1}]

DateListPlot[{data, 
 Transpose[{data[[All, 1]], MeanFilter[data[[All, 2]], 7]}]}]

MeanFiltered

Karsten 7.

Posted 2015-03-26T11:33:48.137

Reputation: 26 728

2I used a range of 7 for the MeanFilter, as the SARIMA model revealed a seasonal order of 7 and because assuming a weekly seasonality makes sense. – Karsten 7. – 2015-03-26T16:30:04.797

Thanks a lot. And really great job, if you want there is also a last question http://mathematica.stackexchange.com/questions/78311/how-to-investigate-the-seasonality-of-the-data

– Revious – 2015-03-26T17:19:24.087

How could you see that the seasonality is 7 days? – Revious – 2015-05-07T20:06:31.060

3

Your description of what you want to do is quite vague and the "not a regression" part is kind of contradictory. Therefore I'll take

"Is it possible to calculate the same model also for the range of dates on which I provided the data?"

part to formulate an answer.

Importing your data saved in TSV format

data = MapAt[DateString[{#, {"Day", "/", "Month", "/", "Year"}}] &,
 Import["D:\\Analytics www.superinformati.com Panoramica del pubblico 20141201-20150303 - Sheet 1.tsv"][[3 ;;, {1, 2}]]
 , {All, 1}]

Finding the SARIMA process

tsm = TimeSeriesModelFit[data]

One can use RandomFunction to create multiple simulations assuming a random process. The following code produces 5 simulations. I use Length@data - 30 because your data looks like the real trend starts somewhere after 30 days.

rf1 = RandomFunction[tsm["BestFit"], {Length@data - 30}, 5]

Creating a plot of these simulations and of their mean

randomP1 = 
 DateListPlot[Transpose[{data[[30 ;;, 1]], #}] & /@ rf1["States"], 
  PlotStyle -> Opacity[1/2], 
  PlotRange -> {{data[[1, 1]], data[[-1, 1]]}, Automatic}]

meanP1 = DateListPlot[
  Transpose[{data[[30 ;;, 1]], TimeSeriesThread[Mean, rf1]["PathStates"]}], 
  PlotStyle -> Red, PlotRange -> {{data[[1, 1]], data[[-1, 1]]}, Automatic}]

Putting everything into one plot

Show[{randomP1,
 DateListPlot[data, PlotStyle -> Directive[Black, Thick]],
 meanP1}]

SARIMA

Doing the same using a ARIMA model

tsm2 = TimeSeriesModelFit[data, "ARIMA"]    

rf2 = RandomFunction[tsm2["BestFit"], {Length@data - 30}, 5]

randomP2 = 
 DateListPlot[Transpose[{data[[30 ;;, 1]], #}] & /@ rf2["States"], 
  PlotStyle -> Opacity[1/2], 
  PlotRange -> {{data[[1, 1]], data[[-1, 1]]}, Automatic}]

meanP2 = DateListPlot[
  Transpose[{data[[30 ;;, 1]], TimeSeriesThread[Mean, rf2]["PathStates"]}], 
  PlotStyle -> Red, PlotRange -> {{data[[1, 1]], data[[-1, 1]]}, Automatic}]

Show[{randomP2,
 DateListPlot[data, PlotStyle -> Directive[Black, Thick]],
 meanP2}]

ARIMA

Karsten 7.

Posted 2015-03-26T11:33:48.137

Reputation: 26 728

1

Increasing the number of simulations from 5 to a much higher number will create a much smoother line for the mean.

– Karsten 7. – 2015-03-26T15:53:45.957