Performance regression for Fold?

26

10

I noticed an over 10x performance drop on Mathematica v9.0.1 (as Oleksandr R. commented, also v8) compared with v7.01 for this code:

SetSystemOptions["CatchMachineUnderflow" -> False];
AbsoluteTiming@ 
 Fold[{(#[[1]] Sin[10.5])/(#2 + 1), #[[2]] + Tan[#[[1]]]} &,
   {Sin[10.5], 1.0}, N@Range[10^6]] 

I got {0.184287, {0., 0.105747}} on v7, but got {2.311755, {0., 0.105747}} on v9. Note that the Do[...] version of above code needs about 2 seconds. Thus I suspect that on v9 the Fold code is not properly auto-compiled.

Is this a regression? (Also is there a way to see if auto-compile worked or not?)

Additional information

I compared that in v7 and v9 the CompileOptions are the same (except that v9 has some options that are not present in v7):

SystemOptions[CompileOptions]

{"CompileOptions" -> {"ApplyCompileLength" -> [Infinity], "ArrayCompileLength" -> 250, "AutoCompileAllowCoercion" -> False, "AutoCompileProtectValues" -> False, "AutomaticCompile" -> False, "BinaryTensorArithmetic" -> False, "CompileAllowCoercion" -> True, "CompileConfirmInitializedVariables" -> True, "CompiledFunctionArgumentCoercionTolerance" -> 2.10721, "CompiledFunctionMaxFailures" -> 3, "CompileDynamicScoping" -> False, "CompileEvaluateConstants" -> True, "CompileOptimizeRegisters" -> False, "CompileParallelizationThreshold" -> 10, "CompileReportCoercion" -> False, "CompileReportExternal" -> False, "CompileReportFailure" -> False, "CompileValuesLast" -> True, "FoldCompileLength" -> 100, "InternalCompileMessages" -> False, "ListableFunctionCompileLength" -> 250, "MapCompileLength" -> 100, "NestCompileLength" -> 100, "NumericalAllowExternal" -> False, "ProductCompileLength" -> 250, "ReuseTensorRegisters" -> True, "SumCompileLength" -> 250, "SystemCompileOptimizations" -> All, "TableCompileLength" -> 250}}

To compare, I also tried the following code, where the performance on v7 and v9 are roughly the same (v9 is on average 5% more slowly though).

AbsoluteTiming@Fold[# + Sin[#2] &, 1.0, Range[10^6]]

A difference is that here the first argument of function in Fold is a number, and in the problematic example, the first argument is a list.

PS: I noticed this issuee during a discussion at About auto-compiling and performance between Do and Fold . But considering this regression issue is a different question, I ask here separately.

Yi Wang

Posted 2014-02-17T04:39:37.683

Reputation: 6 937

3Note that it is not customary to use the "bugs" tag until we have confirmation from WRI or consensus in the community that it is really a bug. However, in this particular case, the regression is so obvious that tagging it as a bug right away seems justified. – Oleksandr R. – 2014-02-17T08:26:45.767

1Incidentally, version 8 is somehow intermediate between 7 and 9. Its performance is closer to that of version 9 than to 7, but about 10% faster than the former. So I think it would be fair to say that the major regression occurred in version 8, but nobody noticed until now. – Oleksandr R. – 2014-02-17T08:29:37.530

@OleksandrR. Thanks for the comment! I will be careful in using a bug tag in the future, but keep it for the time being as you advised. – Yi Wang – 2014-02-17T10:36:10.337

2I get {5.068002, {0., 0.105747}} on version 10. Seems like it wasn't fixed. – Jacob Akkerboom – 2014-07-09T21:26:37.707

1This bug is still present in 10.2 – RunnyKine – 2015-08-04T09:56:01.343

2

After seeing @ilian's answer, I have to reconsider the status of this as a bug, because it is obviously caused by numerical overflow in user code. I am de-tagging appropriately. If anyone disagrees, please roll back and let's discuss it.

– Oleksandr R. – 2015-09-06T20:36:02.587

Answers

15

The reason for this behavior is that autocompilation always uses settings equivalent to "RuntimeOptions" -> {"Quality", "WarningMessages" -> False}.

As previously noted in Silvia's answer, the automatic compilation is invoked for input exceeding

SystemOptions["CompileOptions" -> "FoldCompileLength"]

(* {"CompileOptions" -> {"FoldCompileLength" -> 100}} *)

however, when $n$ becomes greater than 165, the compiled function interpreter switches to uncompiled evaluation because of machine underflow. This can be seen if we compile the function ourselves,

cf0 = Compile[{{arg, _Real, 1}}, Fold[{(#[[1]] Sin[10.5])/(#2 + 1), 
        #[[2]] + Tan[#[[1]]]} &, {Sin[10.5], 1.0}, arg], "RuntimeOptions" -> "Quality"];

cf0[N[Range[165]]]

(* {6.37932*10^-308, 0.105747} *)

cf0[N[Range[166]]]

(* CompiledFunction::cfn: Numerical error encountered at instruction 12; 
proceeding with uncompiled evaluation. >>

  {-3.360393999426673*10^-310, 0.105747} 

*)

where the first element in the last result is not a machine number.

The speed can be recovered by turning off underflow checking, for example

cf = Compile[{{arg, _Real, 1}}, Fold[{(#[[1]] Sin[10.5])/(#2 + 1), 
       #[[2]] + Tan[#[[1]]]} &, {Sin[10.5], 1.0}, arg], "RuntimeOptions" -> "Speed"];

AbsoluteTiming[cf[N[Range[10^6]]]]

(* {0.202218, {0., 0.105747}} *)

As for why version 7 is different, I am not completely sure. The more fine-grained control allowed by "RuntimeOptions" did not exist before the compiler overhaul in version 8. Earlier than that, it is possible that all machine arithmetic exception handling was controlled by the legacy "CatchMachineUnderflow" system option (first implemented in 1999).

Given that automatic compilation is invisible to the user, I am inclined to believe that always using the safer "RuntimeOptions" -> "Quality" was a conscious design choice. Explicit compilation should be used if other exception handling scenarios are desired.

ilian

Posted 2014-02-17T04:39:37.683

Reputation: 24 492

Based on your answer I'm strongly inclined to remove the "bugs" tag. Thanks for looking into this issue. – Oleksandr R. – 2015-09-06T20:26:55.793

Thanks for your informative answer! I didn't realize there is an underflow. – Silvia – 2015-09-07T02:20:47.647

16

(This is not an answer, but only a phenomenon I observed. And I think it might be a bug.)

I think in version 9, Mathematica fails to compile the Fold for $n\geq 166$ where $n$ is the integer number in Range. The precise threshold may be different from OS to OS, but I suspect this phenomenon exists in all version 9.

Note the default "FoldCompileLength" is 100:

"CompileOptions" /. SystemOptions["CompileOptions"] // "FoldCompileLength" /. # &

100

Now take a look at the steps of the whole evaluation chain for different $n$:

evalSteps = Map[(
                    steps = 0;
                    TraceScan[steps++ &,
                        Fold[{(#1[[1]] Sin[10.5])/(#2 + 1), #1[[2]] + 
                                        Tan[#1[[1]]]} &, {Sin[10.5], 1.0}, N[Range[#]]],
                        _,
                        TraceInternal -> True, TraceOff -> _Message];
                    {#, steps}) &,
            Range[1, 500]];

ListLinePlot[evalSteps, PlotStyle -> Red, Frame -> True, PlotRange -> All]

strange failure of compile?

So it seems to suggest that auto-compilation occurs correctly at $n=100$, but strangely fails for $n\geq 166$.

However, a close checking of the evaluation chains returned from Trace (along with the levelIndentFunc I described in this question) reveals that auto-compilation might occur correctly for $n\geq 166$ too, but mma somehow falls back to uncompiled version after then. The following comparison is between $n=166$ and $n=165$:

comparison between succeeded and failed cases

Silvia

Posted 2014-02-17T04:39:37.683

Reputation: 25 336

I can confirm the exact same behaviour on Linux Mint Debian with both Mathematica 8 and 9. – sebhofer – 2014-04-23T08:37:41.043

@sebhofer Thanks for checking it! – Silvia – 2014-04-23T08:40:23.333

I confirm this behavior with Mathematica 8.0.4 under Windows. – Alexey Popkov – 2014-04-23T10:12:20.750