Bad performance of LengthWhile?

14

1

The performance of LengthWhile has been improved in v11.1, now the lengthwhile below is no longer faster.


A friend of mine showed me this example, it's a test comparing LengthWhile to a self-made lengthwhile written in a direct and conventional way:

lengthwhile[x_, t_] := Module[{i = 0, l = Length@x}, While[i < l && t@x[[i + 1]], i++]; i]

lst = RandomInteger[{-2, 2}, {10^4, 10}];
rst1 = LengthWhile[#, # >= 0 &] & /@ lst; // AbsoluteTiming
rst2 = lengthwhile[#, # >= 0 &] & /@ lst; // AbsoluteTiming
rst1 == rst2
{3.941000, Null}
{0.474000, Null}
True

LengthWhile is much slower than the reinvented wheel! Why? Simply a bad performance of LengthWhile? Or LengthWhile isn't used in a proper way?

xzczd

Posted 2014-10-07T13:22:42.190

Reputation: 44 878

Answers

14

There are several reasons. Firstly the built-in function has some minor overhead to check the arguments and call the appropriate internal function depending on whether the first argument is a list, a sparse array or an association.

Secondly, with a packed array, LengthWhile uses compilation in an attempt to increase performance. There is some overhead in evaluating Compile, which is especially noticeable for your example with many small lists. (Note that if you do lst2 = Developer`FromPackedArray[lst] the built-in LengthWhile is faster than it is on the packed list.)

Finally, there appears to be a bug in the implementation of the compilation, such that the compiled function calls back to the main evaluator for the predicate function. You can see this by capturing the CompiledFunction from a Trace and examining it with CompilePrint:

Needs["CompiledFunctionTools`"];

CompilePrint @@ Cases[Trace[LengthWhile[lst[[1]], # >= 0 &]], _CompiledFunction, -1, 1]
blah...
7 B2 = MainEvaluate[ Hold[Statistics`TakeWhileDump`predfun$42706][I5]]
blah...

The internal function calling Compile is Statistics`TakeWhileDump`findLastPosition. It appears that the predicate function is not being inlined as we would desire (despite "InlineExternalDefinitions" being used). I'm not sure what the rules are about inlining external definitions, so I'm not sure if this is due to a change in Compile or bad code in Statistics`TakeWhileDump`findLastPosition.

Simon Woods

Posted 2014-10-07T13:22:42.190

Reputation: 81 905

1There is line predfun[arg_] := pred[arg]; in findLastPosition. Then Compile is called with predfun. It causes uncompiled evaluation (why?). If I change ...predfun[Compile`GetElement[... to ...pred[Compile`GetElement[... it works as desired. – ybeltukov – 2014-10-07T16:30:23.017

@ybeltukov Just dug out the definition of Statistics\TakeWhileDump`findLastPositionwith??and modified all thepredfunpart, theAbsoluteTiming` changed from 3.7s to 2.7s in my computer. – xzczd – 2014-10-08T03:59:25.733

@xzczd What test did you try? Your test have a big overhead due to the compilation. – ybeltukov – 2014-10-08T08:48:00.153

@ybeltukov I tested the code in my question. After deleting the definition of predfun and replacing all the predfun with pred, I got 1 second speed up. – xzczd – 2014-10-08T09:42:35.333

1@xzczd You will obtain bigger speedup for my test. When you apply /@ for a big set of short lists you compile over and over again. – ybeltukov – 2014-10-08T10:48:22.853

1

@ybeltukov BTW it's indeed strange that the predfun is defined inside findLastPosition, it only causes the side-effect: function definitions based on pattern-matching can't be inlined. (There seems to be no specific post for the issue, this is a related one, also notice the comments below. )

– xzczd – 2014-10-08T11:28:24.737

Response from Wolfram company: ……Thank you for your message and the link of the post. I have filed a report on this performance issue of LengthWhile and thank you for bringing it to our attention.…… – xzczd – 2014-10-10T06:39:28.610

14

Your test is quite synthetic: you take only few first elements. If you you have longer sequence of positive elements then build-in LengthWhile is faster

lst = RandomInteger[{-1, 30000}, 100000];
rst1 = LengthWhile[lst, # >= 0 &]; // AbsoluteTiming
rst2 = lengthwhile[lst, # >= 0 &]; // AbsoluteTiming
rst1 == rst2
(* {0.096340, Null} *)
(* {0.166603, Null} *)
(* True *)

Update:

Amazingly, the compiled version is considerably faster then LengthWhile.

cLengthWhile = Compile[{{x, _Integer, 1}, {thr, _Integer}}, 
   Module[{i = 0, l = Length@x}, 
    While[i < l && (x[[i + 1]] >= thr), i++]; i], 
   CompilationTarget -> "C", RuntimeAttributes -> {Listable}, 
   RuntimeOptions -> "Speed"];

rst3 = cLengthWhile[lst, 0]; // AbsoluteTiming
rst1 == rst3
(* {0.000138, Null} *)
(* True *)

Update 2:

For your set of short lists there is quite fast uncompiled function

lengthwhile[x_, t_] := 
 Module[{i = 0, l = Length@x}, While[i < l && t@x[[i + 1]], i++]; i]
lengthWhile2[x_, thr_] := 
 Dimensions[x][[2]] - Total@Unitize@Accumulate[Transpose@UnitStep[x - thr] - 1]

lst = RandomInteger[{-2, 2}, {10^4, 10}];
rst1 = LengthWhile[#, # >= 0 &] & /@ lst; // AbsoluteTiming
rst2 = lengthwhile[#, # >= 0 &] & /@ lst; // AbsoluteTiming
rst3 = lengthWhile2[lst, 0]; // AbsoluteTiming
rst4 = cLengthWhile[lst, 0]; // AbsoluteTiming
rst1 == rst2 == rst3 == rst4
(* {3.990231, Null} *)
(* {0.307152, Null} *)
(* {0.004986, Null} *)
(* {0.001347, Null} *)
(* True *)

ybeltukov

Posted 2014-10-07T13:22:42.190

Reputation: 41 907

1

+1. Have a look also here, it is closely related to your comment on compiled version (also my comment to that answer).

– Leonid Shifrin – 2014-10-07T14:59:11.490

@xzczd, see my recent update concerning your case exactly. – ybeltukov – 2014-10-08T11:08:15.607