## How does Plus work on machine precision Real arguments?

16

6

I thought Kahan's summation method would make a nice example for students to use to think about round-off error [W. Kahan, Pracniques: Further Remarks on Reducing Truncation Errors, Commun. ACM 8  (1965), 40]. The method is available (I surmise) through the option Method -> "CompensatedSummation" of Total, but it's an easy program to write that is nicely laid out in the half-page article.

Below the vector x0 is a list whose elements are to be added. Note that s0 is the "exact" sum, that is, the sum of the exact numbers represented by the floating-point numbers in x0, rounded to machine precision. I consider it the target of the summation methods below.

SeedRandom[2];
x0 = 2 + RandomReal[1, 1000000];
s0 = N@Total[SetPrecision[x0, Infinity]];


Here is a norm (with sign) that measures the relative error in x - x0 in units of $MachineEpsilon. meps[x_, x0_, eps_:$MachineEpsilon] := ((x - x0)/x0)/eps;


Then here are some ways to add up the vector. The timings are not particularly important, but they're somewhat interesting. I unpack x0 before apply Plus, since it would be unpacked anyway by Apply (@@), in order that the unpacking not be included in the timing; the sum is the same we would get with Plus @@ x0.

{With[{x0 = DeveloperFromPackedArray[x0]},
sP = Plus @@ x0; // AbsoluteTiming],
sF = Fold[Plus, x0]; // AbsoluteTiming,
sT = Total[x0]; // AbsoluteTiming,
sC = Total[x0, Method -> "CompensatedSummation"]; // AbsoluteTiming
}[[All, 1]]
(*  {0.131215, 0.120635, 0.000571, 0.01399}  *)


The example was chosen to show that Plus does something nearly like compensated summation but not exactly the same.

meps[{sP, sF, sT, sC}, s0]
(*  Plus       Fold     Total     Compensated  Version
{-0.838972, 149.337, -56.2111, 0.}          10.4.1
{-0.838972, 149.337, -2.51692, 0.}          11.1.1  *)


Aside from being surprised that Plus is such a slow method, I got to wondering what Plus is doing. With some effort, one can find that the order of the elements in x0 can affect the result of Plus. The only results I can find have errors of either 1 or 0 in the last bit. They are compared to the results of iterative adding using Fold.

meps[{Plus @@ Sort@x0, Fold[Plus, Sort@x0]}, s0]
(*  {-0.838972, -33.5589}  *)

meps[{Plus @@ -Sort[-x0], Fold[Plus, -Sort[-x0]]}, s0]
(*  {-0.838972, 182.057}  *)

SeedRandom[0];
With[{x0 = RandomSample[x0]}, meps[{Plus @@ x0, Fold[Plus, x0]}, s0]]
(*  {0., -158.566}  *)

SeedRandom[1];
With[{x0 = RandomSample[x0]}, meps[{Plus @@ x0, Fold[Plus, x0]}, s0]]
(*  {-0.838972, -93.9648}  *)


Does anyone know how Plus works? It would be nice to be able to explain it.

3Might want to check on progressively larger examples to see if it scales as n log(n), that might indicate sorting under the hood. I remark that I do not know the answer offhand and probably will not have time to peek into the code to find out. – Daniel Lichtblau – 2016-01-10T23:31:11.083

A related remark is that this runs the risk of being closed as requiring an internal developer response. All the same, an interesting question I think. – Daniel Lichtblau – 2016-01-10T23:32:15.667

perhaps it simply uses extended (but not infinite) precision – george2079 – 2016-01-11T01:43:17.170

Plus on machine doubles will not use extended precision unless there is an overflow. – Daniel Lichtblau – 2016-01-11T04:46:15.743

This is a related point about compensated summation, to an extent http://mathematica.stackexchange.com/questions/79174/compilation-of-total-with-compensated-summation

– dr.blochwave – 2016-01-11T07:46:58.133

@DanielLichtblau Seems slightly closer to n log(n) than n; best fits are proportional to n^1.14 or (n log(n))^1.06. – Michael E2 – 2016-01-11T11:50:41.363

1@george2079 Perhaps you mean extended-precision registers on the CPU/FPU are used to accumulate the result? That may be possible if the summand are chunked and the intermediate sums rounded to MP from time to time. If the whole computation is done with as little as 6 extra bits, the result rounds to the "exact" 53-bit result. – Michael E2 – 2016-01-11T17:19:34.547

1If it is a matter of extended precision registers, this is likely to vary based on operating system, whether Compile was used somewhere, and probably a few other incidentals (like, phase of the moon). – Daniel Lichtblau – 2016-01-12T15:56:30.620

Since Plus has attribute Orderless, it isn't surprising that Plus @@ x0 and Fold[Plus, x0] produce different results. The order of arguments of Plus doesn't matter and expectedly Plus @@ x0, Plus @@ Sort@x0, Plus @@ -Sort[-x0] produce identical results. – Alexey Popkov – 2017-05-06T15:53:26.277

2@AlexeyPopkov I did check out the Orderless attribute: Plus can give different results (of at most one ulp) for different orderings of the arguments (via RandomSample). If the Orderless attribute affects evaluation, it is not by the arguments being sorted into canonical order before evaluation. It makes sense to me that orderless numerical functions like Plus would go directly to evaluation on numeric input. Simple example: Plus[1., -1., $MachineEpsilon/2] vs. Plus[1.,$MachineEpsilon/2, -1.]. Orderless is more important for symbolic computations. – Michael E2 – 2017-05-06T20:07:45.263

1Interesting question. Incidentally you forgot to include Tr, my favorite method to sum a list. Including it in v10.1 I find it twice as fast as (plain) Total with the same error. – Mr.Wizard – 2017-08-01T09:14:04.920

1@Mr.Wizard Thanks. I forget why I omitted Tr. Either I overlooked it or skipped it because it gave the same error as Total and preferred comparing Total with and without compensated summation. Interestingly in V11.1.1 (Mac/Intel), RepeatedTiming gives the same speed 0.25 sec for both Total and Trbut repeated group timing withAbsoluteTimingas above gives means of0.63and0.42respectively. (I includedFold because I know how the numbers are being added in that case.) – Michael E2 – 2017-08-01T12:22:12.417

2

Following up on your own comment about chunking I was able to replicate the result of Plus using a simple Partition with a block size of 100,000:

myPlus[x__] := Total /@ Partition[{x}, 1*^5, 1*^5, 1, 0] // Total

meps[myPlus @@ x0, s0]

-0.838972


Interesting, but I think it's the unpacking, not the chunking, that leads to the Plus result: Any (evenly-dividing) block size works and so does plain unpacking. For instance meps[Total[Partition[x0, 1*^6, 1*^6, 1, 0], 2], s0] or meps[Total[DeveloperFromPackedArray@x0], s0]. That the algorithm might depend on the data representation didn't occur to me until now. Coincidentally, I made this possibly related observation last night in chat.

– Michael E2 – 2017-08-01T12:36:31.213

@MichaelE2 I may have fooled myself, but I also think I lost the point of this code by using Total[..., 2] at the last minute. Please see the restored definition and try other block sizes, which now for me give different results. – Mr.Wizard – 2017-08-01T15:42:56.533

Okay, it's now giving me different results for different sizes. Testing Total[.., 2] more extensively shows it doesn't quite always give -0.83... but usually does. It's still interesting to me that Total on the unpacked x0 gives a different result than on the packed x0`. – Michael E2 – 2017-08-01T16:39:04.387