## Function to calculate and plot sample variogram

7

I'm trying to create a function that calculates and plots the sample variogram for a time series. It should take two arguments: data, a time series, essentially a list of reals; k, an integer greater than 0. It should produce a ListLinePlot with $k$ on the horizontal axis and $\hat{G_k}$ on the vertical axis.

The plotting and such isn't really the problem. The problem is getting the mathematical formulas, which are below, into Mathematica formulas.

$\hat{G_k}$ is essentially a ratio of variance at lag $k$ over the variance at lag $1$:

$\hat{G_k}=\frac{s^2_k}{s^2_1},k=1,2,...$

$s^2_k=\frac{\sum_{t=1}^{n-k}(d^k_t-\bar{d}^k)^2}{n-k-1}$

$\bar{d}^k=(n-k)^{-1}\sum{}d^k_t$

$d^k_t=z_{t+k}-z_t$

$z_t$ is the time series of the length $n$.

I've been beating my head against this for a while now. Any help would be appreciated.

Edit: A variogram is statistical tool to assess whether a time series is stationary or not. For a stationary it starts out at $1$ for $k=1$ and then increases until it starts to hover around a certain value. The theory and formulas I have here is from Time Series Analysis and Forecasting by Example. You might be able to use the Search Inside at Amazon to have a look at it.

Edit2: Here is a demoplot based on dummydata as to how the end result should look for a stationary time series. 1What's a variogram? Can you provide a link? – rcollyer – 2012-02-28T19:02:52.733

rcollyer: It's a version of a correlogram and can be directly computed with ListConvolve, ListCorrelate, et al. – whuber – 2012-02-28T19:39:55.843

As ListPlot can do the same thing as ListLinePlot, I removed the version-8 tag. – rcollyer – 2012-02-28T19:42:45.603

@rcollyer ListLinePlot was "New in 6" -- not a problem even for me ;-) – Mr.Wizard – 2012-02-28T20:38:11.433

@Mr.Wizard I swore it was new in 8. – rcollyer – 2012-02-28T22:33:38.060

@rcollyer That's OK, even I have trouble keeping versions straight sometimes... – Brett Champion – 2012-02-29T04:29:39.957

@BrettChampion that's actually comforting to know. Either way, the v.8 tag did not belong. – rcollyer – 2012-02-29T04:40:26.900

This question is practically a duplicate of http://mathematica.stackexchange.com/questions/2197/function-for-autocorrelation, which has much simpler answers.

– whuber – 2012-03-06T00:08:10.900

You have some programs to calculate variogram of a time eerie or an image, in Fractal Geography by A. Dauphiné, 2012, Wiley.

– Dauphine – 2012-02-29T07:04:55.020

4

Lag differences are not the same as second-differencing etc, so Differences is not the right approach.

test2 = Array[f, 10]

In:= Differences[test2, 3]


Out= {-f + 3 f - 3 f + f, -f + 3 f - 3 f + f, -f + 3 f - 3 f + f, -f + 3 f - 3 f + f, -f + 3 f - 3 f + f, -f + 3 f - 3 f + f, -f + 3 f - 3 f + f}

Building on kguler's answer, here is an alternative way of getting lag-differences. I have written it up as a separate function because it has more general utility than this specific application.

lagdif[l_List, k_Integer?Positive] /; k < Length[l] :=
Drop[l, k] - Drop[l, -k]


We can then use a similar approach:

dif[list_] := Table[lagdif[list, k], {k, Length[list] - 1}];
dbar[list_] := Mean /@ dif[list];
var[list_] := Variance /@ (Most@(dif[list]));
variogram[list_] := ((Rest@#)/First@#) &@var[list];


Testing with I(1) data (i.e. non-stationary with a single unit root):

$y_t = y_{t-1} + \epsilon_t$

data = Accumulate[RandomReal[{-1, 1}, 20]];

GraphicsRow[{ListLinePlot[data], ListLinePlot[variogram[data]]}] This makes me wonder how powerful the variogram is for data with shocks that have bounded support.

4

Does the following direct implementation of your expressions work for you?

  ClearAll[data, dif, dbar, var, variogram];
dif[list_] :=  Table[(list[[k + 1 ;;]] - list[[;; -(k+1)]]),
{k, Length[list] - 1}]
var[list_] := Variance /@ (Most@(dif[list]));
variogram[list_] := (#/First@#) &@var[list];


usage

 data = Accumulate[RandomReal[{-1, 1}, 100]];
GraphicsRow[{ListLinePlot[data], ListLinePlot[variogram[data]]}] Stationary data:

 data2 = RandomReal[{-1, 1}, 100];
GraphicsRow[{ListLinePlot[data2],
ListLinePlot[Reverse@variogram[data2]]}] This is not quite right. I don't think Differences works correctly for this application for differences bigger than 1. – Mr Alpha – 2012-02-28T21:18:23.160

@MrAlpha, you are right. I changed the function to calculate the differences. Hope this one works. – kglr – 2012-02-28T23:00:58.120

2

My literal implementation taken directly from the equations is below:

ClearAll[gk];
gk[z_,k_]:=Module[{n,s2k,dk,s21,d1},
n = Length@z;
dk[t_] := z[[t+k]]-z[[t]];
s2k = Sum[(dk[t]-Sum[dk[t],{t,n-k}]/(n-k))^2,{t,n-k}]/(n-k-1);
d1[t_] := z[[t+1]]-z[[t]];
s21 = Sum[(d1[t]-Sum[d1[t],{t,n-1}]/(n-1))^2,{t,n-1}]/(n-2);
s2k/s21
]


and this will generate an example plot

With[{z = RandomReal[{0, 1}, 100]},
ListLinePlot[{gk[z, #] & /@ Range, z}]
] Note that in the third equation, I am assuming the sum is from 1 to n - k.

1

I don't know if it is OK to answer your own question, but I thought I would give an answer as to what I ultimately ended up with. This is essentially based on Verbeia's answer.

I've added a option for how many lags to calculate since calculating the variance for lags up to $n-1$ doesn't make much sense, since the number of data points you have decrease with the size of the lag. This leads to the estimate of the variance being uncertain.

lagdif[l_List, k_Integer?Positive] /; k < Length[l] :=
Drop[l, k] - Drop[l, -k]
variogram[list_,
k_] := (#/First@#) &@(Variance /@ Table[lagdif[list, i], {i, k}]);
plotvariogram[list_, k_] :=
GraphicsRow[{ListPlot[list, Joined -> True,
AxesLabel -> {"t", "\!$$\*SubscriptBox[\(X$$, $$t$$]\)"}],
ListPlot[variogram[list, k], AxesOrigin -> {0, 0},
AxesLabel -> {"k",
"\!$$\*SubscriptBox[OverscriptBox[\(G$$, $$^$$], $$k$$]\)"},
Joined -> True]}]


Here are some example outputs:

With NIID data With AR(2) data With a random walk data 