Test if data is regularly sampled

1

I want to determine whether I have regularly sampled data or not, a là RegularlySampledQ. The data itself might be huge. I tried a bunch of different approaches and the best I could come up with was this:

regSampQ[d_, window_: 100] :=
  If[Length@d > window,
   Module[{eq = True},
    Do[
     If[! Equal @@ Differences[d[[n - window ;; n]]], 
       eq = False; Break[]
      ], {n, window + 1, Length@d, window}];
    eq
    ],
   Equal @@ Differences[d]
   ];
regSampQ~SetAttributes~HoldFirst

It turns out this is a somewhat efficient way to do this, beating out my other best attempt, which was just

regSampNaive = Function[Null, Equal @@ Differences[#], HoldFirst];

The problem is that the data can be incredibly large. Here're some test cases I had:

test1 = RandomReal[{}, 50000];

regSampNaive@test1 // RepeatedTiming

{0.0037, False}

regSampQ@test1 // RepeatedTiming

{0.000097, False}

test2 = ConstantArray[0, 50000];

regSampNaive@test2 // RepeatedTiming

{0.0025, True}

regSampQ@test2 // RepeatedTiming

{0.0026, True}

test3 = Join[ConstantArray[0, 25000], ConstantArray[1, 25000]];

regSampNaive@test3 // RepeatedTiming

{0.0023, False}

regSampQ@test3 // RepeatedTiming

{0.0014, False}

But I feel like there has to be a really clean, fast way to do this... am I wrong?

b3m2a1

Posted 2018-10-18T07:10:45.647

Reputation: 42 610

1On my machine (OS X V11.3) Length@DeleteDuplicates@Differences[d[[n - window ;; n]]] != 1 cuts the timings in half for True – Mike Honeychurch – 2018-10-18T08:10:46.370

@MikeHoneychurch oh wow you're right. That's...embarrassing for Equal. But a very nice optimization. It cuts all my timings by a small factor at least. Actually, thinking about it the overhead might be in Apply instead. It might just be a data-copying issue. – b3m2a1 – 2018-10-18T08:14:18.823

See if your method is faster if you widen the window to e.g. 300. I don't recall exactly whether Apply gets autocompiled but if it does it would be for 200-250+ list length. ps. DeleteDuplicates was a first "easy" thing that came to mind but there are probably other ways if you got creative – Mike Honeychurch – 2018-10-18T23:13:43.317

@MikeHoneychurch compilation didn’t actually help when I used it procedurally. The data transfer step is the killer. – b3m2a1 – 2018-10-18T23:35:35.070

I'm talking about autocompilation. https://mathematica.stackexchange.com/questions/311/how-do-you-determine-the-optimal-autocompilation-length-on-your-system

– Mike Honeychurch – 2018-10-18T23:47:52.087

@MikeHoneychurch I know, but using explicit compilation of the whole process or at least important parts should outperform or perform just as well as the autocompilation. Compiling my code in any way shape or form actually just makes it run slower. – b3m2a1 – 2018-10-18T23:50:33.203

ok. "thumbs up emoji" – Mike Honeychurch – 2018-10-18T23:51:11.957

No answers