Writing compiled functions as fast as Python's Numba

10

3

I want to write some code to simulate a damped oscillator that is just as fast as code written using Numba's @njit decorator. I've written the mathematica code and mine is 20-40x slower than the python code written by YouTuber Jack of Some.

Here is the code from Jack of Some's video on speeding up Python code with Numba; I've changed it a bit to run in just one jupyter cell:

import numpy as np
from numba import jit, njit, types, vectorize

@njit
def friction_fn(v, vt):
    if v > vt:
        return - v * 3
    else:
        return - vt * 3 * np.sign(v)

@njit
def simulate_spring_mass_funky_damper(x0, T=10, dt=0.0001, vt=1.0):
    times = np.arange(0, T, dt)
    positions = np.zeros_like(times)

    v = 0
    a = 0
    x = x0
    positions[0] = x0/x0

    for ii in range(len(times)):
        if ii == 0:
            continue
        t = times[ii]
        a = friction_fn(v, vt) - 100*x
        v = v + a*dt
        x = x + v*dt
        positions[ii] = x/x0
    return times, positions

_ = simulate_spring_mass_funky_damper(0.1)

%time _ = simulate_spring_mass_funky_damper(0.1)

The output is

CPU times: user 1.38 ms, sys: 337 µs, total: 1.72 ms
Wall time: 1.72 ms

vs my Mathematica code

ClearAll[friction, simulateSpring, jacksSpring];

friction = Compile[{{v, _Real}, {vt, _Real}},
   If[v > vt,
        -v*3.0,
        -vt*3.0*Sign[v]]
   ];

simulateSpring = 
  Compile[{{x0, _Real}, {t, _Real}, {dt, _Real}, {vt, _Real}},
   
   Module[{τ, times, positions, v = 0.0, a = 0.0, x = x0},
    
    τ = t;
    times = Range[0.0, t, dt];
    positions = ConstantArray[0.0, Length@times];
    positions[[1]] = x0/x0;
    
    Do[
        τ = times[[i]];
        a = friction[v, vt] - 100*x;
        v = v + a*dt;
        x = x + v*dt;
        positions[[i]] = x/x0;
     ,
     {i, 2, Length@times}];
    {times, positions}
    ]
   ];
jacksSpring[x_] := simulateSpring[x, 10.0, 0.0001, 1.0];

Print["CPU time: ", Timing[jacksSpring[0.1]][[1]]*1000, " ms"]

from which we have

CPU time: 27.703 ms

Diffycue

Posted 2020-07-04T19:33:57.647

Reputation: 1 021

2I think it may be AbsoluteTiming that you want, not Timing. – C. E. – 2020-07-04T20:57:47.253

1Right, or RepatedTiming. And with adding SetOptions[Compile, CompilationTarget->"C" ,RuntimeOptions->"Speed"] I get AbsoluteTiming down to about 3 ms, on Windows. – Rolf Mertig – 2020-07-04T21:13:49.533

Thanks! Yes, this helps substantially and I get now 2.639 ms on my machine. I'm still interested if anyone has any ideas to beat the python attempt, especially if they can rewrite the code in a more idiomatic fashion. – Diffycue – 2020-07-04T21:19:53.253

1Maybe someone could try the newer FunctionCompile. Though probably it will be slower than Compile. – Rolf Mertig – 2020-07-04T21:25:21.997

1Why do you need this? If Python runs fine, just use Python. – Rolf Mertig – 2020-07-04T21:27:07.970

^ Agree with Rolf, if it works fine in python, you can also call python from Mathematica. – flinty – 2020-07-04T21:31:44.783

1@RolfMertig I am about to set out on writing a package for scientific computations and I'd like to better understand the numerical capabilities of Mathematica. I think the end product will be cleaner if it is feasible to keep it all in one language.

By the way, thank you for FeynCalc-- it really helped me out in my QFT courses. – Diffycue – 2020-07-04T21:40:26.137

2I checked out FunctionCompile, and the results were so close as to not be worth it on my machine (~6.1ms with Compile and the options above, ~6.2ms with FunctionCompile). I had to wrap Range in a KernelFunction as the built-in Native`Range does not support Real arguments, but since that's only called once I would be surprised if that were the bottleneck. Is there some way to replace the Do with some matrix maths? – Carl Lange – 2020-07-04T22:38:40.023

One thing that’s not clear to me is whether the Numba timing includes the compilation, because when used with Jupyter it is much faster on subsequent evaluations, presumably because of caching. – C. E. – 2020-07-05T09:38:22.873

@C.E. I’m not a python pro but my understanding is that the code provided already runs the function once to compile and then times the second run. Without this, the timing scheme will include the compilation time. – Diffycue – 2020-07-12T04:36:40.473

Answers

18

Based on the experience obtained here:

friction = Compile[{{v, _Real}, {vt, _Real}}, If[v > vt, -v*3.0, -vt*3.0*Sign[v]]];

simulateSpring = 
  Compile[{{x0, _Real}, {t, _Real}, {dt, _Real}, {vt, _Real}}, 
   Module[{τ, times, positions, v = 0.0, a = 0.0, x = x0}, τ = t;
    times = Range[0.0, t, dt];
    positions = Table[0.0, {Length@times}];
    positions[[1]] = x0/x0;
    Do[τ = times[[i]];
     a = friction[v, vt] - 100*x;
     v = v + a*dt;
     x = x + v*dt;
     positions[[i]] = x/x0;, {i, 2, Length@times}];
    {times, positions}], RuntimeOptions -> Speed, CompilationTarget -> C, 
   CompilationOptions -> {InlineExternalDefinitions -> True, 
     InlineCompiledFunctions -> True}];

lib = simulateSpring[[-1]]

Print["CPU time: ", RepeatedTiming[lib[0.1, 10.0, 0.0001, 1.0]][[1]]*1000, " ms"]
CPU time: 1.6 ms

Compiled with TDM GCC 5.1.0-2 64bit, "SystemCompileOptions"->"-Ofast".

RepeatedTiming of the original jacksSpring[0.1] is 20 ms here. I tried timing the python code but failed. (Seems that Numba is not configured correctly on my laptop, the timing is about 0.35 s. )

xzczd

Posted 2020-07-04T19:33:57.647

Reputation: 44 878

1I'm surprised it even works to write RuntimeOptions -> Speed and CompilationTarget -> C, usually they're written "Speed" and "C". Nevertheless, I tried this and it gives me 1.26 both with and without quotation marks, whereas Numba gives me 1.23 ms. So it's very close. +1 – C. E. – 2020-07-05T08:59:28.153

2@C.E. String options and option values of Compile don't need to be string, so do those of NDSolve, AFAIK. (But properties of FittedModel and methods of InterpolatingFunction need to be string. ) – xzczd – 2020-07-05T09:25:23.353

@xzczd great! Can we apply this method for PDEs (FEM) in MMA? – ABCDEMMM – 2020-07-08T15:02:55.920

@ABCDEMMM If you mean FEM built in NDSolve, then high level functions like NDSolve already makes use of Compile internally AFAIK; if you mean self implementation of FEM, just search in this site, you'll see most (if not all) of the answers implementing FEM in a low level way makes use of Compile, here is an example.

– xzczd – 2020-07-08T15:10:20.223

@xzczd it means that if we use build-in NDSolve, every simulation will run directly in c-code? – ABCDEMMM – 2020-07-08T22:11:16.443

@ABCDEMMM No, but NDSolve is already optimized with various techniques, including but not limited to Compile. BTW, if you're expecting to use Compile on NDSolve, it's impossible, because Compile can handle only a small part of functions, for more info check this: https://mathematica.stackexchange.com/q/1096/1871

– xzczd – 2020-07-09T01:44:07.047