Is it possible for LibraryLink function to be Listable like that in Compile?

12

6

We know in Compile, there is an Option RuntimeAttributes -> {Listable} which can let the compiled function easily explore the power of thread parallelization.

So is it possible for a LibraryLink function to be Listable like Compile?

matheorem

Posted 2016-01-19T02:52:07.970

Reputation: 14 483

1

It is possible to parallelize the library function yourself by using frameworks like OpenMP. Other than that, I'm very pessimistic that you can persuade a library function easily to act like a listable compiled function.

– halirutan – 2016-01-19T06:29:50.793

@halirutan Hi,halirutan. I didn't mean that way. Because I only write inner loop function, and let mma to handle the loop part, I think this is more flexible. I currently found my specific librarylink function which returns a list can not get boost from ParallelTable( this is strange, I am still trying to figure out what is wrong). – matheorem – 2016-01-19T06:37:45.960

@halirutan But why Listable is not possible? After all, Compile to target C we also get a librarylink function. – matheorem – 2016-01-19T06:37:56.167

I know you didn't mean it that way, but that's the most straight-forward way to get similar behavior. AFAIK, functions that are compiled with parallelization and Listable are not different from others. The parallelization happens in a layer before the actual underlying compiled function. This all is hidden from the user and I'm not sure you can access it to at least see how they did it and when exactly it happens. – halirutan – 2016-01-19T06:50:51.503

@halirutan OK, I understand. Maybe they can provide an option in the future. – matheorem – 2016-01-19T06:55:20.107

@halirutan It would be interesting to at least understand how Mathematica implements parallelization though. I made a (long-term?) chatroom to talk about these things: http://chat.stackexchange.com/rooms/34526/librarylink-mathlink

– Szabolcs – 2016-01-19T09:19:51.767

Answers

14

Simple solution

There is an easier solution than the one I gave almost 2 years ago. In principle, you wrap your library function inside another CompiledFunction that is listable. Let the code speak:

fun = LibraryFunctionLoad["demo", "demo_I_I", {Integer}, Integer];
With[{fc = fun}, 
  funListable = Compile[{{i, _Integer, 0}}, fc[i],
    RuntimeAttributes -> {Listable},
    Parallelization -> True]
];

If you inspect the compiled function, you see there is no external call. Instead, you find a special instruction that calls the library function code:

libCall

The performance is exceptionally good even though we have this wrapping. It might be not the best idea to profile such a simple function, but let's do it anyway. For comparison, I'm creating the same function as directly compiled code

inc = Compile[{{i, _Integer, 0}},
  i + 1,
  RuntimeAttributes -> {Listable},
  Parallelization -> True,
  (* CompilationTarget -> "C" *)
  ]

You should leave out the CompilationTarget as it was in my tests slower than the one that runs on the virtual machine:

r = Range[10^7];
funListable[r]; // AbsoluteTiming
inc[r]; // AbsoluteTiming
funListable[r] === inc[r]

For the parallelized library function, I get a runtime of 0.42 seconds, while the compiled version needs 0.53 seconds (about 0.7 seconds with compilation target "C"). If you want to see this paradigm outperform other solutions, you should read this answer of mine where I used it to parallelize a highly complex and non-trivial c-code that came from a library.

Old Answer

It seems it is possible. At least I got a toy-example that works. Without having any specific information about this topic from WRI, I always suspected that LibraryLink was not mainly created to give users a way to attach shared library functions to the kernel. I believe that the underlying technology was first used in Compile to make CompilationTarget->"C" possible and afterwards, a part of the framework was exposed to the user to make it possible to use LibraryLink. If someone has more information about this, please feel free to add it here.

That being said, when you look at the InputForm of a simple compiled function, you will find that it contains a LibraryFunction in the exact same way you would get it when loading your own library functions with LibraryFunctionLoad:

fc = Compile[{{x, _Integer, 0}},
  x,
  Parallelization -> True,
  RuntimeAttributes -> {Listable},
  CompilationTarget -> "C"
  ];
InputForm[fc]

(*
CompiledFunction[{10, 10.3, 5852}, {_Integer}, 
  {{2, 0, 0}, {2, 0, 0}}, {}, {0, 1, 0, 0, 0}, {{1}}, 
  Function[{x}, x, Listable], Evaluate, 
  LibraryFunction["/home/some/path/compiledFunction0.so",     
    "compiledFunction0", {{Integer, 0, "Constant"}}, Integer
  ]
]
*)

Some tests with Compile seemed to indicate, that the code that is created is not different with or without using the Listable attribute. This makes me believe that the distribution of arguments for a parallel evaluation of fc happens before the actual LibraryFunction is called. Therefore, when fc is called with a tensor, there might be some wrapper C function that calls the underlying LibraryFunction on all elements of the tensor in parallel.

My idea was that it might be possible to replace the LibraryFunction inside this CompiledFunction when the type of the function is correct. In the above example fc gets a single integer and returns a single integer. Let us use a LibraryLink example of the same type:

libFun = LibraryFunctionLoad["demo",   "demo_I_I", 
  {{Integer, 0, "Constant"}}, Integer]

Note that this function does something different than fc because it increments its argument by one. Additionally, we can ensure that it is not Listable:

Library function call

Looking at InputForm[libFun] reveals that it has the exact same type as the LibraryFunction inside fc except that does something completely differently and was created by us, not by Compile. Let us inject our libFun inside the existing CompiledFunction

fcLibFunc = fc /. _LibraryFunction -> libFun;

fcLibFunc[10]
(* 11 *)

Now the big question is, is fcLibFunc working on lists doing the work in parallel?

fcLibFunc[{1, 2, 3, 4, 5, 6, 7, 8, 9}]
(* {2, 3, 4, 5, 6, 7, 8, 9, 10} *)

That seems to work. Creating a bigger example shows that the function runs parallel. Let us time this simple toy function against a compiled function that does the same:

fc2 = Compile[{{x, _Integer, 0}},
   x + 1,
   Parallelization -> True,
   RuntimeAttributes -> {Listable},
   CompilationTarget -> "C"
   ];

r = Range[10^6];


Do[fc2[r], {100}] // AbsoluteTiming
(* {3.97018, Null} *)

Do[fcLibFunc[r], {100}] // AbsoluteTiming
(* {3.13338, Null} *)

I have measured it several times and it seems our fcLibFunc needs on my machine only 80% of the runtime of fc2. I do not know why this is and whether it can be generalized, but we could show that it is possible to make a library function parallel-Listable.

Let me end by making clear the steps to do this yourself:

  • Create a fake compiled function like above that has the exact same type that your library function has. Please note that you cannot use library functions that change their input arguments. Therefore, you should always use "Constant" passing.

  • Load your library function and replace the last argument inside CompiledFunction with it. This ensures that your library function is called instead of the code that was created by Compile.

  • Think carefully about that when you call this new function with the wrong arguments, the highlevel fake code is used! To give an example, try to evaluate fcLibFunc[I].

halirutan

Posted 2016-01-19T02:52:07.970

Reputation: 109 574

You are genius, halirutan ! +10 : ) – matheorem – 2016-01-20T02:24:00.157

I think this approach is general. since I have successfully turn one of my librarylink function with 9 real input and a real list return into a Listable one. It works perfectly, use all available cpu threads and the transformation is so easy! I would like to call this method, a Mathematica style "openMP". Though this is a simplified version of openMP, with lesser user control, but it is really easy to implement without considering factors such as share and private, data racing issue..etc – matheorem – 2016-01-20T02:29:37.943

there is only one issue, I found strange. The transformed listable version seems not working properly in such as DensityPlot or Plot3D. But anyway, we still have the original pure librarylink version, so this is not a pain. – matheorem – 2016-01-20T02:36:48.637

@matheorem If you speak about DensityPlot using the listability, then as far as I know it never does this. All plot routines will evaluate the functions serially and not in parallel. – halirutan – 2016-01-20T03:14:23.320

I know. But what I mean is that my listable lib function is not producing correct DenstiyPlot. see here http://postimg.org/image/m7wm9jxt3/ and this is strange because I cannot reproduce this bug using simple example

– matheorem – 2016-01-20T05:08:31.693

1Just for fun, try to use the option Evaluated -> False in DensityPlot and if this doesn't work, try to wrap your compiled function wey01 like this: test[args__?NumericQ]:=wey01[args] and then plot test. – halirutan – 2016-01-20T05:14:29.870

Wow, Evaluated -> False works, thank you so much. But why it works? Also I tested some simple listable lib function, they don't need Evaluated -> False – matheorem – 2016-01-20T05:17:01.387

That is because their highlevel code (the part before the LibraryFunction that is used as fall-back) is the same as the compiled library code. I suspect that when the function is too easy (like returning only x), then Mathematica detects this an doesn't bother to call the library function. – halirutan – 2016-01-20T05:18:50.180

1Evaluated->False is undocumented??!! – matheorem – 2016-01-20T05:23:22.750

2Hi, halirutan. I just found that just compile the librarylink function again also works. eg: fclib2 = Compile[{{x, _Integer, 0}}, libFun[x], Parallelization -> True, RuntimeAttributes -> {Listable}, CompilationTarget -> "C", CompilationOptions -> {"InlineExternalDefinitions" -> True}] – matheorem – 2016-02-06T12:24:08.530

Hi, halirutan. Your simple solution is exactly my last comment almost two years ago . It is quite a late update : ) – matheorem – 2017-09-18T00:11:22.363

@matheorem Oh, it seems I missed your comment back then. I only saw this answer again, because someone upvoted it today and I looked over it once more. I didn't want to leave it like it is although it explains some internals and I added the easy solution. Maybe I wasn't aware back then that the internal LibraryFunction call does no harm to the speed. I don't remember. – halirutan – 2017-09-18T00:20:34.127

Yeah, this method is really amazingly simple and effective : ) – matheorem – 2017-09-18T01:26:12.477