Why changing to float-point value does't improve the speed?

7

2

I am testing the first tip of this article

Use floating-point numbers if you can, and use them early.

I've compared a pair codes and finding it's not helpful to use the float-point number during calculation. I don't understand why.

f[x_] := x^2;
Table[Sum[f[i], {i, 1, 100000}], {j, 1, 1000}]; // AbsoluteTiming

Result: 54.0099 sec.

f[x_] := x^2;
Table[Sum[f[i], {i, 1., 100000}], {j, 1, 1000}]; // AbsoluteTiming

Result: 54.6481 sec.

Appreciate your help!

cj9435042

Posted 2018-06-05T01:40:30.073

Reputation: 683

Why not Do[Total[Range[1., 100000.]^2], 1000] // AbsoluteTiming ? – ilian – 2018-06-05T03:01:33.270

Most of the time goes to summing either symbolically or perhaps element-wise, I am not sure which. COuld use Compile if you want to reduce memory use, or else "vectorize" e.g. as sumSquares[n_] := Total[Range[n]^2]. – Daniel Lichtblau – 2018-06-05T03:09:42.250

I see substantially similar was posted while I was running tests... – Daniel Lichtblau – 2018-06-05T03:10:17.167

Answers

14

First, machine integers and machine floats will have similar speeds. Integers will be a little faster, as long as numeric computation is the task. The problem, or pitfall, with integers in Mathematica is that they are treated as exact expressions. In complicated calculations, the integers may grow beyond machine size and division results in exact rationals; further, special functions, like Sin[2], remain unevaluated and are treated as symbolic expressions. When in a computation such non-machine values are introduce, instead of native CPU arithmetic, the software routines of Mathematica are invoked. Naturally, the software routines are slower.

As symbolic software system, Mathematica can do some unexpected things, and it takes a while to learn them all. Most iterative commands, like Table, Sum, Map, etc., will compile their expressions if the number of iterations is high enough (see SystemOptions["CompileOptions"] and scan for options ending in "CompileLength"). What can be compiled takes a long explanation. In the present example, f[i] fails to be compiled, but its value i^2 will be compiled.

(* OP's form -- slow, uncompiled *)
Table[Sum[f[i], {i, 1., 100000}], {j, 1, 100}]; // AbsoluteTiming
(*  {6.84558, Null}  *)

With f[i] evaluated via Evaluate or with i^2 substituted directly, it is somewhat faster due to compilation of the summand:

Table[Sum[Evaluate@f[i], {i, 1., 100000}], {j, 1, 100}]; // AbsoluteTiming
(*  {1.04318, Null}  *)

Table[Sum[i^2, {i, 1., 100000}], {j, 1, 100}]; // AbsoluteTiming
(*  {1.04277, Null}  *)

With integers, it's even faster:

Table[Sum[Evaluate@f[i], {i, 1, 100000}], {j, 1, 100}]; // AbsoluteTiming
(*  {0.160928, Null}  *)

Table[Sum[i^2, {i, 1, 100000}], {j, 1, 100}]; // AbsoluteTiming
(*  {0.148675, Null}  *)

For real speed, use packed arrays and the vectorization of arithmetical operations and many functions. Integers are still faster than floats.

Table[Total[f[Range[1., 100000]]], {j, 1, 100}]; // AbsoluteTiming
(*  {0.071032, Null}  *)

Table[Total[f[Range[1, 100000]]], {j, 1, 100}]; // AbsoluteTiming
(*  {0.046477, Null}  *)

If the integers become bigger than machine integers (bigger than 2^63 - 1), then the integer computation will slow down.

Table[Total[f[2^Range[1., 1000]]], {j, 1, 1000}]; // AbsoluteTiming
(*  {0.336848, Null}  *)

Table[Total[f[2^Range[1, 1000]]], {j, 1, 1000}]; // AbsoluteTiming
(*  {1.00608, Null}  *)

To see a bigger difference, consider the more complicated summand f[i/(i + 1)], which won't be compilable in integers but will be compilable in floats:

Table[Sum[Evaluate@f[i/(i + 1)], {i, 1, 10000}], {j, 1, 100}]; // AbsoluteTiming
(*  {4.94459, Null}  *)

Table[Sum[Evaluate@f[i/(i + 1)], {i, 1., 10000}], {j, 1, 100}]; // AbsoluteTiming
(*  {0.31874, Null}  *)

Michael E2

Posted 2018-06-05T01:40:30.073

Reputation: 190 928

1

Fun fact: on modern x86 hardware, FP division has better throughput than integer division (giving quotient/remainder). So even if i/(i+i) was compiled into native machine code, or if you were doing this in C, you'd expect FP to be faster. e.g. on Skylake, vdivpd ymm has one per 8 clock cycle throughput, and does a SIMD vector of four 64-bit double divisions in one asm instruction. (http://agner.org/optimize/). But 32-bit idiv has one per 6 cycle throughput for one integer division, or 64-bit has one per 24-90 cycle throughput.

– Peter Cordes – 2018-06-05T09:27:50.263

@PeterCordes It's a good point. I was thinking of i/(i + 1) in terms of how Mathematica deals with it. For most integers i, the fraction i/(i + 1) will not be an integer, so M will represent it as a Rational. I'm not sure how you would test it in M, but comparing Quotient and Divide on num = Range[10^8] and num = N@Range[10^8] resp., and den = 1 + num, I get that the integer division Quotient[num, den] is about 15% faster on my Ivy Bridge Core i7 and 20% faster on my Kaby Lake Core i7 (both Macbook Pros). – Michael E2 – 2018-06-05T15:22:23.420

0

I did your base case and got this;

f[x_] := x^2;
Table[Sum[f[i], {i, 1, 100000}], {j, 1, 1000}]; // AbsoluteTiming
{145.3798581, Null}

Case 2 with 1. for i and got this;

{105.1217869, Null}

Case 3 with 1. and 100000. for i and got this;

{103.4749818, Null}

Case 4 with real i and changed f[x]=x*x

{90.511359, Null}

Case 5 with real i and used i*i in the Sum[], (i.e no f[x])

{6.0216106, Null}

Not sure what it all means because I am new to MMA but performance tuning is a black art.

NB: I am on version 8, if that matters.

user58558

Posted 2018-06-05T01:40:30.073

Reputation: 61