Performance of compiled functions



I am trying to find the fastest way to calculate two values where both of them are sum of different expressions. I combine both calculations in one Sum[]. I'm compiling into "C". Here are the test functions I have:

f = Compile[{{n, _Integer, 0}}, Module[{a},
    Sum[Module[{}, a = Sin[-0.001 i^2]; {i*a, a}], {i, 1, n}]], 
   CompilationTarget -> "C", RuntimeOptions -> "Speed"];
g = Compile[{{n, _Integer, 0}}, Module[{a},
    Sum[{i*Sin[-0.001 i^2], Sin[-0.001 i^2]}, {i, 1, n}]], 
   CompilationTarget -> "C", RuntimeOptions -> "Speed"];
h = Compile[{{n, _Integer, 0}}, Module[{a},
    Sum[(a = Sin[-0.001 i^2]; {i*a, a}), {i, 1, n}]], 
   CompilationTarget -> "C", RuntimeOptions -> "Speed"];
q = Compile[{{n, _Integer, 0}}, Module[{},
    {Sum[Sin[-0.001 i^2]*i, {i, 1, n}], 
     Sum[Sin[-0.001 i^2], {i, 1, n}]}], CompilationTarget -> "C", 
   RuntimeOptions -> "Speed"];
q2 = Compile[{{n, _Integer, 0}}, Module[{},
    {Table[Sin[-0.001 i^2]*i, {i, 1, n}] // Total, 
     Table[Sin[-0.001 i^2], {i, 1, n}] // Total}], 
   CompilationTarget -> "C", RuntimeOptions -> "Speed"];
nc[n_] := {Sum[Sin[-0.001 i^2]*i, {i, 1, n}], 
   Sum[Sin[-0.001 i^2], {i, 1, n}]};
Benchmark[f_, n_] := Timing[f[n]];
 Flatten /@ Table[Benchmark[fun, 10000], {fun, {f, g, h, q, q2, nc}}],
  TableHeadings -> {{"f", "g", "h", "q", "q2", "nc"}, {"Timing", 

I expect the function h to be the fastest because I'm reusing an expensive calculation of Sin or, if compiler is smart enough to implement the reuse, approximately same speed from all three. Instead functions q and q2 are the fastest and g is way faster than the other compiled versions, with following results:

    Timing                Result    
f   0.026824    104486. -34.6114
g   0.000782    104486. -34.6114
h   0.020543    104486. -34.6114
q   0.000597    104486. -34.6114
q2  0.000628    104486. -34.6114
nc  0.001784    104486. -34.6114

Why is this happening? My guess is evaluation escapes from compiled body, but why?


Big thanks to halirutan for a good answer! For completeness I added the non-compiled version of his function fHal

fHalNoC[n_] := 
  With[{r = Range[n]}, Total /@ ({r*#, #} &[Sin[-0.001 r^2]])];

Then with a slightly modified benchmark function:

testRange = 10^#  & @{3, 4, 5, 6};
Benchmark[f_, n_] := 
  With[{results = Table[First@AbsoluteTiming[f[n]], {20}]}, 
 Table[Benchmark[fun, n]/
   n, {fun, {f, g, h, q, q2, nc, fHal, fHalNoC}}, {n, testRange}], 
 TableHeadings -> {{"f", "g", "h", "q", "q2", "nc", "fHal", 
    "fHalNoC"}, testRange}]

I got following results (I normalized timing over list length): benchmark results

I guess my lesson learned: even non-compiled version that utilizes Listable is faster than my timid attempts to tune with compilation. Full code available here.


Posted 2013-06-24T23:43:53.150

Reputation: 6 158

4Have you considered looking at the compiled code using Needs["CompiledFunctionTools`"]; CompilePrint[h]? If you want to know what's going on after compiling there will be no way around a careful inspection of the created code. – halirutan – 2013-06-25T00:06:55.507

Will do. Thanks again. – BlacKow – 2013-06-25T00:18:07.870



Maybe two advises for the start:

  • Use the fact that Sin is Listable and you can call Sin[{1,2,3,4,..}] to get a list of results.
  • Don't calculate the sum twice. Calculate the sine part only once and make the multiplication with i in the first sum as vectorized multiplication.

Taking this into account give in a first try something like

fHal = Compile[{{n, _Integer, 0}},
  Module[{r = Range[n]},
    Total /@ ({r*#, #} &[Sin[-0.001 r^2]])
  CompilationTarget -> "C", RuntimeOptions -> "Speed"

With an adapted version of your benchmarking using AbsoluteTiming and averaging the timings from several runs I get here:

enter image description here

where I used for the benchmarking the following:

Benchmark[f_, n_] := 
 With[{results = Table[AbsoluteTiming[f[n]], {20}]},
  {Mean[results[[All, 1]]], results[[1, 2]]}]
 Flatten /@ 
  Table[Benchmark[fun, 10000], {fun, {f, g, h, q, q2, nc, fHal}}], 
 TableHeadings -> {{"f", "g", "h", "q", "q2", "nc", 
    "fHal"}, {"Timing", "Result"}}]

Why isn't h the fastest

When you look at


you clearly see that your whole body of h is not compiled down but sent back to the kernel for evaluation through a MainEvaluate call. This seems to have the reason in the CompoundExpression you are using inside Sum. A way out would be

h = Compile[{{n, _Integer, 0}}, 
   Sum[With[{a = Sin[-0.001 i^2]}, {i*a, a}], {i, 1, n}], 
   CompilationTarget -> "C", RuntimeOptions -> "Speed"];

which then gives equal timings as q.


Posted 2013-06-24T23:43:53.150

Reputation: 109 574

Thank you! That is a great idea to use Listable. Although I'm wondering why is the h function so slow? 20 times slower than q! Do you have any explanation for that? – BlacKow – 2013-06-25T00:16:13.797

@BlacKow Yes I have. See my update. – halirutan – 2013-06-25T00:18:02.433