## Elementwise, compilable min function

10

3

I am trying to implement efficiently a transfer-matrix like algorithm. On each iteration, I have two vectors $x=\{x_1,\dots,x_n\}$, $y=\{y_1,\dots,y_n\}$ with real numbers and I need to compute the vector $\{\min(x_1,y_1),\dots,\min(x_n,y_n)\}$. I tried four approaches for computing it:

1. Uncompiled MapThread[Min,{listX,listY}] call
2. Compiled MapThread[Min,{listX,listY}] call
3. Uncompiled RandomPrivateMapThreadMin[{listX,ListY}] call
4. Compiled RandomPrivateMapThreadMin[{listX,ListY}] call

(Code see below). The resulting timings were: 4.5s (for 1), 3.5s (for 2), 1.5s (for 3) and 4 reverted to uncompiled evaluation, giving 6.3s.

So my questions are:

1. Is the uncompiled RandomPrivateMapThreadMin[{listX, ListY}] call the fastest way to evaluate the element-wise minimum of two lists, or does anybody have a better idea?
2. Why does the example using RandomPrivateMapThreadMin[{listX, ListY}] fail to compile?

My code examples are:

it1[wd_, len_] :=
Module[{pot1, fval},
pot1 = RandomVariate[NormalDistribution[], {len, wd}];
fval = ConstantArray[0., wd];
Do[fval = MapThread[Min, {RotateLeft[fval], fval}] + pot1[[k]];, {k, 1, len}];
Return[fval]];

it2 := Compile[{{wd, _Integer}, {len, _Integer}},
Module[{pot1, fval},
pot1 = RandomVariate[NormalDistribution[], {len, wd}];
fval = ConstantArray[0., wd];
Do[fval = MapThread[Min, {RotateLeft[fval], fval}] + pot1[[k]];, {k, 1, len}];
Return[fval]]];

it3[wd_, len_] :=
Module[{pot1, fval},
pot1 = RandomVariate[NormalDistribution[], {len, wd}];
fval = ConstantArray[0., wd];
Do[fval = RandomPrivateMapThreadMin[ {RotateLeft[fval], fval}] + pot1[[k]];, {k, 1, len}];
Return[fval]];

it4 := Compile[{{wd, _Integer}, {len, _Integer}},
Module[{pot1, fval},
pot1 = RandomVariate[NormalDistribution[], {len, wd}];
fval = ConstantArray[0., wd];
Do[fval = RandomPrivateMapThreadMin[ {RotateLeft[fval], fval}] + pot1[[k]];, {k, 1, len}];
Return[fval]]];


And to obtain the timing values, I used

Table[it1[20, 10] // First, {10000}]; // AbsoluteTiming
Table[it2[20, 10] // First, {10000}]; // AbsoluteTiming
Table[it3[20, 10] // First, {10000}]; // AbsoluteTiming
Table[it4[20, 10] // First, {10000}]; // AbsoluteTiming


"Why does the example using RandomPrivateMapThreadMin[{listX, ListY}] fail to compile?" - it's not in the list here.

– J. M.'s ennui – 2012-09-28T13:14:35.760

2For your consideration: minNew = Compile[{{x, _Real}, {y, _Real}}, Min[x, y], RuntimeAttributes -> {Listable}]. Now, try giving minNew[] two lists as arguments... – J. M.'s ennui – 2012-09-28T13:22:47.857

Thanks for the suggestion! I tried your minNew function, it works faster than the pure MapThread call but about 20% slower than RandomPrivateMapThreadMin, unfortunately... – Alex D. – 2012-09-28T13:57:47.953

6

I don't seem to have RandomPrivateMapThreadMin in version 7.

For Integer data you may wish to try:

a = RandomInteger[1*^6 {-1, 1}, 5*^6];
b = RandomInteger[1*^6 {-1, 1}, 5*^6];

Timing[r2 = a # + b (1 - #) &@UnitStep[b - a];]
r1 === r2


{2.028, Null}

{0.171, Null}

True

For machine-size Real data this compiles nicely:

cf = Compile[{{a, _Real, 1}, {b, _Real, 1}}, a # + b (1 - #) & @ UnitStep[b - a]];

a = N@a;
b = N@b;

Timing[r2 = cf[a, b];]
r1 === r2


{2.137, Null}

{0.109, Null}

True

The UnitStep approach is quite fast! But it only works for 2 lists, can it be extended to any number of lists? – xzczd – 2013-10-22T12:49:26.210

1@xzczd You could Fold the operation quite quickly. If your have many lists (more lists than the length of each list, for example) then you would do better to Transpose, e.g. Min /@ Transpose[lists]. – Mr.Wizard – 2013-10-22T14:57:14.720

@brama RuntimeAttributes was not available in version 7 which I used when I wrote this answer. From the test I just ran cf appears to still be a bit faster, but yours is nice clean code. You need to use AbsoluteTiming to get a proper timing because of the parallelism. – Mr.Wizard – 2014-08-13T00:54:40.270

+1, I was going to suggest something similar. – Leonid Shifrin – 2012-09-28T13:22:05.237

@Leonid good, that means this might actually work. :^) – Mr.Wizard – 2012-09-28T13:30:24.393

1

Indeed, this works perfectly, thanks very much! I wrote a compiled version of your suggestion and it runs about 20% faster than my fastest version above (option 3, the RandomPrivateMapThreadMin code. I learned of the existence of this function in Mathematica 8 from link). Thanks again!

– Alex D. – 2012-09-28T13:56:09.063

@Alex thanks for the Accept, but I encourage all users to wait 24 hours before Accepting an answer. Who knows what methods may be posted if people are not discouraged from reading your question! – Mr.Wizard – 2012-09-28T14:15:31.390

@AlexD. Keep in mind that your timings may get dominated by other code you have in your tests, since it may well be that other operations such as random number generation can take more time than the computation of the element-wize Min. In that sense, the tests of Mr.Wizard are cleaner. – Leonid Shifrin – 2012-09-28T14:41:24.917

@Mr.Wizard: Thanks for the comment, you're right - I'll leave it open longer next time! – Alex D. – 2012-09-28T20:07:37.263