Time-series decomposition in Mathematica

26

17

I'm studying time-series in R with this book, and there is a nice command in R that creates decompositions. Inside Mathematica 9 the command can be executed as:

Needs["RLink`"]
InstallR[];
REvaluate["{
    url <- \"http://www.massey.ac.nz/~pscowper/ts/cbe.dat\"
    CBE <- read.table(url, header = T)
    Elec.ts <- ts(CBE[, 3], start = 1958, freq = 12)
    plot(decompose(Elec.ts, type = \"multi\"))
 }"]

and creates this plot:

enter image description here

The first part of my attempt to reproduce this in Mathematica is:

url = "http://www.massey.ac.nz/~pscowper/ts/cbe.dat";
CBE = Import[url];
elecTS = TemporalData[CBE[[2 ;; -1, 3]], {{1958}, Automatic, "Month"}];
DateListPlot[elecTS["Path"], Joined -> True, AspectRatio -> 0.2]

enter image description here

How do I continue this to perform the decomposition wholly within Mathematica?

Murta

Posted 2012-12-21T00:17:19.317

Reputation: 23 859

I would also really like to how to do this solely in Mathematica. My guess is that some of the Version 9 built in time-series functions will make it relatively simple. – Cameron Murray – 2012-12-21T03:38:21.170

1While I have no doubt that Mathematica is meta tool and all that, the fact remains that specialized software do the job better-like for decomposition and econometric analysis, nothing beats Eviews – rselva – 2012-12-21T08:31:06.250

2@rselva Asking in the most constructive spirit, could you provide some examples illustrating your statement ? – b.gates.you.know.what – 2012-12-21T10:54:25.747

@rselva Agree on Eviews (I wish I could afford a personal license). FWIW, as much as I love MMA, using it for econometrics is at best a challenge, and at worse a time killer. I wish MMA's design could be made more flexible for econometrics, but I understand why this would be a messy chore-and-a-half. I plan to learn some R myself (and am pleased they've built in this new capability). For those interested in a version of R that is automatically linked up to Excel, check out: http://rcom.univie.ac.at/

– telefunkenvf14 – 2012-12-21T20:57:35.353

1This question would be easier to answer if you could find a reference to the algorithm that R's time series decomposition is using. It might even be described in the documentation. Asking to reverse-engineer the functionality of a magical third-party black box is still a valid question, but a much harder one (as evidenced by your comment on b.gatessucks's answer). – None – 2012-12-23T01:30:13.780

Well, I went ahead and found the documentation: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/decompose.html. It seems pretty easy to implement: just a bunch of moving averages. The period of seasonality is assumed to be 1 unit (12 data points in this case), so no need for a heavy-duty NonlinearModelFit.

– None – 2012-12-23T01:36:21.590

@RahulNarain, very nice!.. I'll study it and see how to implement it in Mathematica. Much more simple then my fourier assumption. – Murta – 2012-12-23T01:58:50.037

@b.gatessucks Pl check out Economics top cited papers using Thompson Reuters database or even Google Scholar.Check out top Economics books- all of them have reference to Eviews- typically under 'Methods'.Search for Granger, casualty - all typical terms for econometrics to see EViews influence on Econometric analysis. Better yet try EViews if your university / Office has access to the s/w – rselva – 2012-12-23T03:27:02.863

Answers

23

Based on @b.gatessucks answer and on @RahulNarain comment tip, I created this functions for the multiplicative decompose case. I changed @b.gatessucks method for seasonality to keep it closer from R method, and used TemporalData to easily handle time interval.

decompose[data_,startDate_]:=Module[{dateRange,plot,plotOptions,observedData,observedPlot,trendData,trendPlot,dataDetrended,seasonalData,seasonalPlot,randomData,randomPlot},

    dateRange={{startDate,1},Automatic,"Month"};

    (*Setting Plot Options*)
    plotOptions={AspectRatio -> 0.2,ImageSize-> 600,ImagePadding->{{30,10},{10,5}},Joined->True};

    (*Observed data*)
    observedData=TemporalData[data,dateRange]["Path"];
    observedPlot=DateListPlot[observedData,Sequence@plotOptions];

    (*Extracting trend component*)
    trendData=Interpolation[MovingAverage[observedData,12],InterpolationOrder->1];
    trendPlot=DateListPlot[{#,trendData[#]}&/@observedData[[All,1]],Sequence@plotOptions]//Quiet;
    dataDetrended=N@{#[[1]],#[[2]]/trendData[#[[1]]]}&/@observedData//Quiet;

    (*Extracting seasonal component*)
    seasonalData=Mean/@Flatten[Partition[N@dataDetrended[[All,2]],12,12,1,{}],{2}];
    seasonalData=TemporalData[PadRight[seasonalData,Length[data],seasonalData],dateRange]["Path"];
    seasonalPlot=DateListPlot[seasonalData,Sequence@plotOptions];

    (*Extracting random component*)
    randomData=TemporalData[dataDetrended[[All,2]]/seasonalData[[All,2]],dateRange]["Path"];
    randomPlot=DateListPlot[randomData,Sequence@plotOptions];


    (*Plotting data*)
    plot=Labeled[
            Grid[Transpose[{Rotate[Style[#,15,Bold],90\[Degree]]&/@{"observed","trend","seasonal","random"}
                          ,{observedPlot,trendPlot,seasonalPlot,randomPlot}}
                          ]
                ]
            ,Style["Decomposition of multiplicative time series",Bold,17]
            ,Top
            ]

]

Using the functions like this:

rawData = Import["http://www.massey.ac.nz/~pscowper/ts/cbe.dat"][[2 ;;, 3]];
decompose[rawData, 1958]

We get:

enter image description here

Almost exactly as in R!

I say "almost" because R don't use interpolation, so the MovingAverage lost 12 point in R that we don't lose in this function due to interpolation method. I prefer to keep the ticks in each plot, I find it's better to read. It's a question of personal options.

Murta

Posted 2012-12-21T00:17:19.317

Reputation: 23 859

11

While you come back with a version 9 solution here is an old school approach :

The first entry is labels so I removed it :

rawData = Import["http://www.massey.ac.nz/~pscowper/ts/cbe.dat"][[2 ;;]];

Added the dates to the imported data : they are monthly dates starting {1958, 1, 1} :

data = Transpose[{NestList[DatePlus[#, {1, "Month"}] &, {1958, 1, 1}, Length[rawData] - 1], 
                  rawData[[All, 3]]}];

plotData = DateListPlot[data, Joined -> True, PlotLabel -> "Data"];

The trend is a 12 - point moving average, found by trial and error and common sense :

mA = 12;
trend = Interpolation[Transpose[{AbsoluteTime[#] & /@ data[[1 ;; -mA, 1]], 
                      MovingAverage[data[[All, 2]], mA]}], InterpolationOrder -> 0];

plotTrend =  DateListPlot[{#, trend[AbsoluteTime[#]]} & /@ data[[1 ;; -mA, 1]], PlotLabel -> "Trend"];

Since we're doing a multiplicative decomposition we divide the starting data by its trend :

dataDetrended = {#[[1]], #[[2]]/trend[AbsoluteTime[#[[1]]]]} & /@ data[[1 ;; -mA]];

Seasonality is just a sine with a 1-year period :

seasonality = NonlinearModelFit[{AbsoluteTime[#[[1]]], #[[2]]} & /@ dataDetrended, 
                 a + b Sin[2 \[Pi] t/(365.25 86400 ) + f], {a, b, f}, t];

plotSeasonality = DateListPlot[{#, seasonality[AbsoluteTime[#]]} & /@ data[[1 ;; -mA, 1]], Joined -> True, PlotLabel -> "Seasonality"];

The random part is what is left after diving the de-trended data by the seasonal factor.

random = {#[[1]], #[[2]]/seasonality[AbsoluteTime[#[[1]]]]} & /@ dataDetrended[[1 ;; -mA]];

plotRandom = DateListPlot[random, Joined -> True, PlotLabel -> "Random"];

Final result :

GraphicsColumn[{plotData, plotTrend, plotSeasonality, plotRandom}, ImageSize -> 350]

enter image description here

b.gates.you.know.what

Posted 2012-12-21T00:17:19.317

Reputation: 18 845

Very cool! My only question is about the seasonal part. I believe that the Sin assumption is too specific, but I don't know yet a more general way to do that, maybe with more terms, like a fourrier approach. – Murta – 2012-12-22T23:09:53.967

6

In my experience economists tend not to write time-series decomposition algorithms of the their own but rather use well established ones.

Among them two most well known are ARIMA-X13 and TRAMO-SEATS. Both are implemented by the US Census Bureau and executables are available here.

I've tried and implemented a simple package that calls the CB's files in the background and returns seasonally adjusted series. It very much follows the strategy that I learned by reading this answer on exporting multipage pdfs in mma on SE.

To use ARIMA-X13 in mma you should:

  1. Download Economica package here and put it into your $UserBaseDirectory.
  2. Download X13AS.EXE file from the CB's site and put it into "C:\Windows\System32"

Then you might extract the trend component of your series:

unemp = Rest@
 QuandlData[
  "ILOSTAT/UNE_TUNE_NB_SEX_T_AGE_AGGREGATE_TOTAL_Q_RUS", {{2001, 1, 
 1}, {2014, 1, 1}}]

unempSA = SeasonalAdjustmentX13[Sort@unemp]

DateListPlot[{unemp, unempSA}]

Trend and original series

ARIMA-X13 allows also to extract cyclical component and noise, however I've yet to add them to that package.

iav

Posted 2012-12-21T00:17:19.317

Reputation: 1 875

5

This looks like very useful function for e.g. economic time series. If this can be done easily in R does it need to be done in Mathematica? With the qualifier that this is my first day testing 9 he is my attempt at the alternative:

Needs["RLink`"]
InstallR[];
data = REvaluate["{
      url <- \"http://www.massey.ac.nz/~pscowper/ts/cbe.dat\"
      CBE <- read.table(url, header = T)
      Elec.ts <- ts(CBE[, 3], start = 1958, freq = 12)
      decompose(Elec.ts, type = \"multi\")
   }"]

From here you can extract the decomposed data:

Normal@TemporalData[data[[1, x, 1]], {{1958}, Automatic, "Month"}]

where x=1, 2, 3, 4 give the observed, seasonal, trend, and random data. A solely Mathematica based approach would presumably require EstimatedProcess and SARIMAProcess as per some of the docs examples but I haven't got that far in learning 9.

Mike Honeychurch

Posted 2012-12-21T00:17:19.317

Reputation: 36 211