Efficient implementation of a linear complexity measure of binary sequences

11

2

For a implementation of testing the quality of random number generators I implemented the NIST test suite in Mathematica based on the nice workbook by Ilja Gerhardt. However I took up the challenge that Ilja was complaining about the efficiency of Mathematica (see provided link). I was able to speed up most of the test considerably by using Mathematica's native functions. However the the most costly test the Linear Complexity Test gives me some trouble. The test itself is straightforward (see NIST documentation), however it uses the Linear Complexity function based on the Berlekamp-Massey algorithm. My current implementation of this algorithm looks like this:

LinearComplexity = Compile[
{{u, _Integer, 1}}, 
Module[{len, b, c, d, p, pinit, tmp, l, m}, len = Length[u]; 
l = 0;
m = 0;
c = b = Join[{1}, Table[0, {len - 1}]];
pinit = Table[0, {len}];
Do[
 d = Mod[
   u[[n]] + 
    Take[c, {2, l + 1}].Reverse[Take[u, {n - l , n - 1 }]], 2];
 If[d == 1,
  tmp = c;
  p = pinit;
  p[[1 + n - m ;; 1 + l + n - m]] = b[[1 ;; l + 1]];
  c  = Mod[c + p, 2];
  If[2*l <= n, l = n - l; m = n; b = tmp;];
  ]
 , {n, 1, len}
 ];
l
]];

which is about 50% faster than the original cody by Ilja (see link to his Notebook for reference).

Just as an example, the code returns the linear complexity of a binary sequence as follows

SeedRandom[29328];
tst = RandomInteger[{0, 1}, 20000];
LinearComplexity[tst] // Timing

returns

{4.882831, 10001}

However for very large binary sequences the code still is rather slow. I'm aware of multiple shortfalls of it, and would need advice how to fix them. First I would like to get rid of the Do-loop which is in any case not a good idea within Mathematica. Furthermore I would like to switch over to SparseArrays since for long random binary sequences this should reduce the size of the datastructure at least by 50% (random binary sequences have about 50% ones and 50% zeros).

One potential implementation of SparseArrays is the following code:

LinearComplexitySparse[seq_List] :=Module[{len, u, b, c, d, p, pinit, tmp, l, m},
len = Length[seq];
u = SparseArray[seq]; 
l = 0;
m = 0;
c = b = SparseArray[{1 -> 1}, len, 0];
pinit = SparseArray[{}, len, 0];
Do[
 d = If[l == 0, Mod[u[[n]], 2], 
   Mod[u[[n]] + 
     Take[c, {2, l + 1}].Reverse[Take[u, { n - l, n - 1  }]], 2]];
 If[d == 1,
  tmp = c;
  p = pinit;
  p[[1 + n - m ;; 1 + l + n - m]] = b[[1 ;; l + 1]];
  c  = Mod[c + p, 2];
  If[2*l <= n, l = n - l; m = n; b = tmp;];
    ];
 , {n, 1, len}
 ];
Return[l];
];

Which is more memory efficient but 2x slower since it is not compiled.

SeedRandom[29328];
tst = RandomInteger[{0, 1}, 20000];
LinearComplexitySparse[tst] // Timing

{8.392854, 10001}

I did not manage to implement compile (see following code)

LinearComplexitySparseCompiled := 
 Compile[{{seq, _Integer, 1}}, 
  Module[{len, u, b, c, d, p, pinit, tmp, l, m},
    len = Length[seq];
    u = SparseArray[seq]; 
    l = 0;
    m = 0;
    c = b = SparseArray[{1 -> 1}, len, 0];
    pinit = SparseArray[{}, len, 0];
    Do[
     d = If[l == 0, Mod[u[[n]], 2], 
       Mod[u[[n]] + 
         Take[c, {2, l + 1}].Reverse[Take[u, { n - l, n - 1  }]], 2]];
     If[d == 1,
      tmp = c;
      p = pinit;
      p[[1 + n - m ;; 1 + l + n - m]] = b[[1 ;; l + 1]];
      c  = Mod[c + p, 2];
      If[2*l <= n, l = n - l; m = n; b = tmp;];
        ];
     , {n, 1, len}
     ];
    Return[l];
    ];
  ]

since it is complaining about c not being a Tensor and multiple other stuff:

SeedRandom[29328];
tst = RandomInteger[{0, 1}, 20000];
LinearComplexitySparseCompiled[tst] // Timing

Compile::cplist: c should be a tensor of type Integer, Real, or Complex; evaluation will use the uncompiled function. >>

Compile::part: Part specification b[[1;;l+1]] cannot be compiled since the argument is not a tensor of sufficient rank. Evaluation will use the uncompiled function. >>

Compile::cset: Variable c of type {_Integer,0} encountered in assignment of type {_Real,1}. >>

Compile::cplist: c should be a tensor of type Integer, Real, or Complex; evaluation will use the uncompiled function. >>

Compile::part: Part specification b[[1;;l+1]] cannot be compiled since the argument is not a tensor of sufficient rank. Evaluation will use the uncompiled function. >>

Compile::cset: Variable c of type {_Integer,0} encountered in assignment of type {_Real,1}. >>

Compile::cret: The type of return values in Module[{len,u,b,c,d,p,pinit,tmp,l,m},len=Length[seq];u=SparseArray[seq];l=0;m=0;c=b=SparseArray[{Rule[<<2>>]},len,0];pinit=SparseArray[{},len,0];Do[d=If[Equal[<<2>>],Mod[<<2>>],Mod[<<2>>]];If[d==1,Set[<<2>>];Set[<<2>>];Set[<<2>>];Set[<<2>>];If[<<2>>];];,{n,1,len}];Return[l];]; are different. Evaluation will use the uncompiled function. >>

{8.689256, 10001}

any ideas to improve the above code, to get it compiled or to point me to some efficient implementation of the linear complexity measure are mostly welcome.

Rainer

Posted 2013-03-12T07:31:42.150

Reputation: 2 682

I found code for this here but your implementation is already several times faster. Also, Do loops are not a problem in Compiled code the way they are in the top-level language. If you are using version 8 or later have you tried compiling to C?

– Mr.Wizard – 2013-03-12T08:25:26.447

The implementation LinearComplexity compiles fine in C and is speeding up the implementation. However the overall structure of the routine is still not optimal from the memory consumption point of view. Thus the implementation of LinearComplexitySparse which does not compile returning the errors I mentioned in the description. – Rainer – 2013-03-13T17:31:46.173

Answers

7

I believe the following is equivalent. It gets rid of some of the tensor copying. On my machine I'm seeing a factor of almost 3 in speed improvement.

LinearComplexityC2 = Compile[{{u, _Integer, 1}},
   Module[{len, b, c, l = 0, m = 0, bsub, mmn, oldl, oldm},
    len = Length[u];
    c = b = Join[{1}, Table[0, {len - 1}]];
    Do[
     If[OddQ[u[[n]] + Sum[c[[j + 1]]*u[[n - j]], {j, 1, l}]],
      oldl = l;
      oldm = m;
      bsub = b[[1 ;; l + 1]];
      If[2*l <= n, l = n - l; m = n; b = c;];
      mmn = 1 + n - oldm;
      c[[mmn ;; mmn + oldl]] =
       Mod[c[[mmn ;; mmn + oldl]] + bsub, 2];
      ], {n, 1, len}];
    l], CompilationTarget -> "C"
   ];

--- edit ---

Adding

"RuntimeOptions" -> "Speed"

gives another factor of 3 or so.

--- end edit ---

Daniel Lichtblau

Posted 2013-03-12T07:31:42.150

Reputation: 52 368

and now that you're in this is going to be even more interesting. – Stefan – 2013-03-13T21:31:35.863

+1. Looks like this is what I tried to achieve with my code at the end of my post, but failed. – Leonid Shifrin – 2013-03-13T21:35:08.920

Dear Daniel, this is exactly what I was looking for thanks for sharing the code, this will help a lot. @Leonid & Stefan and all the rest of the lot which added comments in this post: I really had a lot of fun reading all of your inputs on this question. You are all awesome! – Rainer – 2013-03-14T07:01:40.697

1Keep in mind though that the same ideas could be applied to C code from @Leonid and Stefan. That could give another factor of 10 improvement. – Daniel Lichtblau – 2013-03-14T13:51:11.320

6

I was curious and rewrote your code in C, which is not really hard. The code is here. The results are indeed interesting. Compiling your code with CompilationTarget -> "C", I get on my system:

SeedRandom[29328];
tst=RandomInteger[{0,1},20000];
LinearComplexity[tst]//AbsoluteTiming

(* {6.130859,10001}  *)

While compiling the hand-written C code:

code = Import["https://gist.github.com/lshifr/5142849/raw/lincomp.c","String"];
Needs["CCompilerDriver`"];
exe = CreateExecutable[code, "lincomp"];

I get

Import["!"<>exe<>" 20000","String"]

(* " Result:  10000, time in milliseconds:  557 "  *)

which is 10 times faster. Since the same C compiler is used in both cases, I have to conclude that the C code generated by Compile is rather sub-optimal in this case.

Digging a little deeper, here is the version of your code which closely follows the hand-written C code, and avoids some array copying inside the loop:

LinearComplexityAlt =
  Compile[{{u, _Integer, 1}},
    Module[{len, b, c, d, pinit, tmp, l, m, acc = 0, cplusp},
     len = Length[u];
     l = 0;
     m = 0;
     tmp = c = b = Join[{1}, Table[0, {len - 1}]];
     pinit = cplusp = Table[0, {len}];
     Do[
       acc = u[[n]];
       Do[acc += c[[1 + i]]*u[[n - i]], {i, 1, l}];
       d = Mod[acc, 2];
       If[d == 1,
         Do[tmp[[i]] = c[[i]], {i, 1, len}];
         Do[ cplusp[[i]] = pinit[[i]] + c[[i]], {i, 1, n - m}];
         Do[cplusp[[i + n - m]] = b[[i]] + c[[i]], {i, n - m + 1, l + 1}];
         Do[c[[i]] = Mod[cplusp[[i]], 2], {i, 1, len}];
         If[2*l <= n,
           l = n - l; m = n;
           Do[b[[i]] = tmp[[i]], {i, 1, len}];
         ];
       ], {n, 1, len}];
     l]];

However, compiling it to C gives even slower execution (about 9 seconds on my system, which is about twice slower than your code). So, obviously, this is not a problem of array copying.

Your code, by the way, is pretty much as efficient for the MVM target as with the C target, since you vectorize inner array-copying loops.

So, in conclusion, the code generated by Compile is indeed quite a bit slower than hand-written C, for this particular problem. Note by the way that this was done on Windows 7 64 bit, MS Visual Studio compiler. I also tried a free Dev-C++ compiler for the hand-written C code, and the resulting code was executing 5(!) times slower (perhaps I did not use the right switch combination, I just used the default). So, the choice of a compiler can also make quite a difference.

One should be able to understand the reasons behind this slowness of Compile - generated code better by inspecting the generated C code, which I did not do.

Leonid Shifrin

Posted 2013-03-12T07:31:42.150

Reputation: 108 027

I had identical timings with the C code from gist. It is badly written, so I made some changes to it and boosted it up to 270ms.

One thing that came to my attention was e.g. the modulo 2 operations. Since n % m == n & (m - 1) (if m is power of 2...at least in ANSI C). This is an optimization as well as the calls to arrayCopy, which is a poorly written memcpy. So I changed that as well.

On Mma-side I used: CreateExecutable[code, "lincomp", CompileOptions -> "-O3 -funroll-loops"] <-- clang/gcc options

Import["!" <> exe <> " 20000", "String"] ==> Result: 10000, time in milliseconds: 269 – Stefan – 2013-03-12T16:45:50.337

@Stefan Thanks for your input. I was not concerned with squeezing the every last millisecond out of the code, or I would have employed similar optimizations. My point was to have code with minimal changes from Mathematica's Compile, so that it is easy to follow for folks not too familiar with C ( I would have probably missed the unroll loops option but other things you described did cross my mind). So, the code may be badly written for the C standards, but it has been done intentionally, to make it easier to understand. – Leonid Shifrin – 2013-03-12T16:51:15.843

@Stefan As I get some spare moment again, I will edit in your input (with attribution, of course). – Leonid Shifrin – 2013-03-12T16:54:54.483

I'm eager to learn about 'attribution' ;) (What the heck! attribution... :) )

I disagree with you btw. that intentional leads to better understanding. – Stefan – 2013-03-12T17:16:40.173

1@Stefan By attribution I meant that I will mention that that input came from you :-). Re: understanding - not all folks here know C, and things like n & m-1 or memcpy can be cryptic enough for those who don't know it. My point basically was that just taking the code from Compile pretty much verbatim, and leaving all the rest to the compiler without doing further optimizations, can still lead to a rather dramatic speed-up in this case. – Leonid Shifrin – 2013-03-12T17:36:08.253

understood. I'll slow down my pace according C/C++, although it is my interface to Mma. Strange enough...

I really only wanted to point out that n % m == n & m -1 and x % (2^n) == x & (2^n-1) which is in C: x & ((1<<n)-1)

There is always efficiency and performance in the discussions that's why I mentioned it. But I did get your point.

(Posted an answer in one of your older 'DayOfWeek' question. ;) ). Promise. I'll stop... – Stefan – 2013-03-12T18:29:15.607

@Stefan "I'll stop" - no, why? I was just explaining my attitude for this particular answer. Actually, it may also be that your interpretation of it (getting the optimized version) is closer to what the OP wanted. I just wanted to illustrate that particular aspect of plain export of M code into C more or less verbatim, and the difference in performance, but it does not mean that my interpretation is the right one. For the one of "give me the fastest possible code", your suggestion is spot on and your version is better than mine. – Leonid Shifrin – 2013-03-12T18:36:43.850

Thank you. If you say that this means something to me. I feel flattered; honestly. – Stefan – 2013-03-12T18:43:36.430

Not badly but academically written. I don't want to be rude which is sometimes happening due to the fact that I'm not a native speaker. I'm sorry if someone is feeling bad about my utterances... – Stefan – 2013-03-12T21:29:08.447

1@Stefan No worries :-). You are partly right too. I've been out of practice in C for a while, so that's why I did not pay much attention to optimizations, even though they occurred to me. Had I written this a few years ago when I was coding in C all the time and chances are that I would have included those without even thinking (that's probably the way you felt about it), and perhaps only then remove some of them for readability. – Leonid Shifrin – 2013-03-12T21:49:18.650

@Stefan Could you post your modifications to the code? I would like to try out your implementation. Since additional code is wrapped around the call of the linear complexity routine even a few milliseconds may add up to considerable speedup of the total implementation. – Rainer – 2013-03-13T17:38:48.403

@ Leonid & Stefan thanks for all of the very good comments and analysis. I appreciate your inputs and will have a good look at them at the weekend. – Rainer – 2013-03-13T17:40:14.160

@Rainer sure. you mean just the modified C code or a full featured LibraryLink implementation? – Stefan – 2013-03-13T17:40:52.197

@Stefan whatever suits you. A LibraryLink implementation example would be great too because in another of my projects I'm considering to connect the QHull code to Mathematica since there is still no 3D convex hull implementation available in Mathematica. For this project a LibraryLink example would be nice... But as I said I do not want to hold you up from really important things ;-) – Rainer – 2013-03-13T17:50:18.323

@Rainer done...but as I said don't tell Andreas this...I AM NOW A JAVA GUY! NATIVE IS EVIL! sigh ... – Stefan – 2013-03-13T18:16:28.067

@Rainer just learned that notification is not working from a post. I included a SSE intrinsic based rand implementation for you as well...if you use it you'll get a guaranteed boost in performance additionally – Stefan – 2013-03-13T19:02:18.087

1@Stefan, hey, Java guy, see, I'd find out anyway :) How's your JVMTools trial license working out for you? :) – Andreas Lauschke – 2013-03-13T20:31:09.107

Works great and even better, concerning memory consumption, on my Mac. Now I need input. So I started to read your examples which are pretty decent. Thank you for having me in. I love the non-native world from now on! :)) – Stefan – 2013-03-13T20:59:02.943

1

@Rainer regarding qhull, have you seen this answer? If you're familiar with qhull's usage it should not be too difficult to extend qh-math to cope with its results other than the Delaunay triangularization. The code is a bit old but the linked answer provides some tips on how to compile it.

– Oleksandr R. – 2013-03-14T13:02:37.493

@Oleksandr, hey cool post! I was not aware of this. I need to try it out. Main background of my qull project is the visualization of Brillouin zones in the reciprocal space for arbitrary Space groups of crystals. – Rainer – 2013-03-14T13:38:30.217

6

Here is the modified C-Code version as was discussed between Leonid and me. @Rainer. I hope you find it useful for your task. (In case you didn't know. Since I got JVMTools from @Andreas I'm not allowed anymore to use C and LibraryLink. He forces me to do from now on everything in Java/Scala <- this is not the opera! Horrible. Isn't it? ;) Please don't tell him... )

lincomplex = "
 # include <stdio.h>
 # include <stdlib.h>
 # include <time.h>

 #include <string.h>
 #include \"WolframLibrary.h\"

 DLLEXPORT mint WolframLibrary_getVersion(){
   return WolframLibraryVersion;
 }

 DLLEXPORT int WolframLibrary_initialize( WolframLibraryData libData) {
   return 0;
 }

 DLLEXPORT void WolframLibrary_uninitialize( WolframLibraryData libData) {
   return;
 }

 int random_in_range (unsigned int min, unsigned int max)
 {
    int range       = max - min,
    remainder   = RAND_MAX & (range - 1),
    bucket      = RAND_MAX >> (range - 1);
    int base_random = rand(); /* in [0, RAND_MAX] */
    if (RAND_MAX == base_random) return random_in_range(min, max);
    /* now guaranteed to be in [0, RAND_MAX) */

    /* There are range buckets, plus one smaller interval
     within remainder of RAND_MAX */
    if (base_random < RAND_MAX - remainder) {
        return min + base_random/bucket;
    } else {
        return random_in_range (min, max);
    }
}


int LinearComplexity(int val) {
    int  l=0, m=0, i=0, len = 0, acc = 0,n=0, d = 0;
    int *u = NULL,*c = NULL, *b = NULL, *pinit = NULL, *tmp=NULL, *cplusp=NULL;

    len = val;
    if(len < 0 ){
        puts(\" The length of the sequence must be a positive integer \");
        return EXIT_FAILURE;
    }

    u = (int*)malloc(len * sizeof(int));


    if(!u){
        puts(\" Memory allocation error \");
        return EXIT_FAILURE;
    }

    for(i=0;i<len;i++){
        u[i]= random_in_range(0,2);
    }

    c = (int*)malloc(len * sizeof(int));
    b = (int*)malloc(len * sizeof(int));
    pinit = (int*)malloc(len * sizeof(int));
    tmp = (int*)malloc(len * sizeof(int));
    cplusp = (int*)malloc(len * sizeof(int));

    tmp[0]=c[0]=b[0]=1;

    if(!(c && b && pinit && tmp && cplusp)){
        puts(\" Memory allocation error \");
        return EXIT_FAILURE;
    }

    for(n=0;n<len;n++){
        acc = u[n];
        for(i=0;i<l;i++){
            acc+=c[1+i]*u[n-i-1];
        }
        d = acc & 1;
        if(d == 1){
            memcpy(c, tmp, len);
            for(i=0;i<n-m;i++){
              cplusp[i] = pinit[i]+c[i];
            }
            for(i=n-m+1;i<l+1;i++){
               cplusp[i] = b[i]+c[i];
            }
            for(i=0;i<len;i++){
               c[i] = cplusp[i] & 1;
            }
            if(2 * l < n){
               l = n-l;
               m=n;
               memcpy(tmp, b, len);
            }
        }
    }

    free(u);
    free(c);
    free(b);
    free(pinit);
    free(tmp);
    free(cplusp);

    return (l+1);
}

DLLEXPORT int linear_complexity(WolframLibraryData libData,
        mint Argc, MArgument *Args, MArgument Res) {
  mint I0;
  I0 = MArgument_getInteger(Args[0]);
  MArgument_setInteger(Res, LinearComplexity(I0));
  return LIBRARY_NO_ERROR;
}
";

Needs["CCompilerDriver`"];
lib = CreateLibrary[lincomplex, "linear_complexity", 
    CompileOptions -> "-O3 -funroll-loops"];

lincomp = LibraryFunctionLoad[lib, "linear_complexity", {Integer}, Integer];

lincomp[20000] // AbsoluteTiming

Out[30]= {0.264698, 10001}

EDIT

@Andreas. Once upon a time I did some research on rand(). Don't ask me why. Maybe that is why all the other guys do have the girls...anyways...I came up with a faster rand() implementation by heavy usage of SSE intrinsics. Maybe you will find that useful for your research.

#include <emmintrin.h>

/*
 Here is a description of the linear congruential generator
 (http://en.wikipedia.org/wiki/Linear_congruential_generator)

 The algorithm for the rand() function in the C library is, however, still a slight
 modification to the above. The equation is the same but the value it returns to the
 calling function is shifted right by 16 bits to reduce low order bits correlation and
 masked by 7FFFh, eliminating the sign bit. The fast_rand() function shown below
 implements the same scalar LCG function that rand() uses. It uses unsigned 32 bit
 integer, but to be compatible with rand() The range is reduced by shifting and masking
 out the upper bit.
*/

#define COMPATABILITY 

//define this if you wish to return values similar to the standard rand(); 



void srand_sse( unsigned int seed ); 

__declspec( align(16) ) static __m128i cur_seed; 

void srand_sse( unsigned int seed ) 
{ 
    cur_seed = _mm_set_epi32( seed, seed+1, seed, seed+1 ); 
} 

// uncoment this if you are using intel compiler
// for MS CL the vectorizer is on by default and jumps in if you
// compile with /O2 ...
//#pragma intel optimization_parameter target_arch=avx
//__declspec(cpu_dispatch(core_2nd_gen_avx, core_i7_sse4_2, core_2_duo_ssse3, generic )
__declspec(noinline) void rand_sse( unsigned int* result ) 
{ 
    __declspec( align(16) ) __m128i cur_seed_split; 
    __declspec( align(16) ) __m128i multiplier; 
    __declspec( align(16) ) __m128i adder; 
    __declspec( align(16) ) __m128i mod_mask; 
    __declspec( align(16) ) __m128i sra_mask; 
    __declspec( align(16) ) __m128i sseresult; 

    __declspec( align(16) ) static const unsigned int mult[4] = { 214013, 17405, 214013, 69069 }; 
    __declspec( align(16) ) static const unsigned int gadd[4] = { 2531011, 10395331, 13737667, 1 }; 
    __declspec( align(16) ) static const unsigned int mask[4] = { 0xFFFFFFFF, 0, 0xFFFFFFFF, 0 }; 
    __declspec( align(16) ) static const unsigned int masklo[4] = { 0x00007FFF, 0x00007FFF, 0x00007FFF, 0x00007FFF }; 

    adder = _mm_load_si128( (__m128i*) gadd); 
    multiplier = _mm_load_si128( (__m128i*) mult); 
    mod_mask = _mm_load_si128( (__m128i*) mask); 
    sra_mask = _mm_load_si128( (__m128i*) masklo); 
    cur_seed_split = _mm_shuffle_epi32( cur_seed, _MM_SHUFFLE( 2, 3, 0, 1 ) ); 

    cur_seed = _mm_mul_epu32( cur_seed, multiplier ); 
    multiplier = _mm_shuffle_epi32( multiplier, _MM_SHUFFLE( 2, 3, 0, 1 ) ); 
    cur_seed_split = _mm_mul_epu32( cur_seed_split, multiplier ); 

    cur_seed = _mm_and_si128( cur_seed, mod_mask); 
    cur_seed_split = _mm_and_si128( cur_seed_split, mod_mask );     
    cur_seed_split = _mm_shuffle_epi32( cur_seed_split, _MM_SHUFFLE( 2, 3, 0, 1 ) );     
    cur_seed = _mm_or_si128( cur_seed, cur_seed_split );     
    cur_seed = _mm_add_epi32( cur_seed, adder); 

#ifdef COMPATABILITY 

     // Add the lines below if you wish to reduce your results to 16-bit vals... 
    sseresult = _mm_srai_epi32( cur_seed, 16); 
    sseresult = _mm_and_si128( sseresult, sra_mask ); 
    _mm_storeu_si128( (__m128i*) result, sseresult ); 

    return; 

#endif 

    _mm_storeu_si128( (__m128i*) result, cur_seed); 
    return; 
} 

If you're using this and the other implementation in a C++ environment, you can wrap this implementation in a nice function object.

struct scalar_LCG // use big names so that others think you're superbright!
{
    scalar_LCG(int seed) : result(0) { srand_sse(seed); }
    unsigned int operator() ()
    {
        rand_sse(&result);
        return result;
    }
    unsigned int result;
};

Have fun and keep hacking!

EDIT 2

If someone has a problem getting the above rand_sse implementation compiled, just notify that to me. I'm part of the queer folk which love compilers and even spend money to have them. So this is possibly too much intel compiler intrinsic...but it should not be a big thing to port this to gcc intrinsic's and MS-CL's.

Since I'm 'unswamp'd' I can help...

Stefan

Posted 2013-03-12T07:31:42.150

Reputation: 5 207

+1. This is nice. About native vs.not, I am not sure I am in your & Andreas's camp. I tend to like languages which are good both to write in and generate code for, and as far as I can tell now, C, javascript and Lisp (and also Mathematica) are some representatives, while Java is likely not. But that's just my current opinion, I know that some people actually generate Java code. – Leonid Shifrin – 2013-03-13T18:26:10.110

@Leonid are you doing LISP? Really? Oh that's marvellous! I love functional calculus, well not so much Lisp because it breaks with some rules but I adore scheme and Haskell as well. Since our discussion yesterday with Andreas I was thinking about what it would take to create a HaskellLink and if that makes sense at all. – Stefan – 2013-03-13T18:44:04.683

Lisp is still on my wish list to learn, but from what I learned already I know that it will be among my favorite languages. In javasript, OTOH, I do some real stuff from time to time. So, becoming better at js is a more immediate need for me, but Lisp stands there right after it, on my todo list. But, recently I haven't had much time for anything except my direct work, which is Mathematica (sometimes plus C or Java). – Leonid Shifrin – 2013-03-13T18:48:54.590

Hi Stefan, just wanted to let you know that you cannot ping people from answers (or questions), so having a conversation in the post is futile since they will never receive the message (unless they chance upon it). Feel free to use a chatroom though :) – rm -rf – 2013-03-13T18:49:09.007

@rm what did you edit? I am just curious. It takes me longer to paste code and reformat it than to write. I do probably something wrong. Is there a better way to post code? – Stefan – 2013-03-13T18:49:14.707

@Stefan I didn't edit the code contents, I just changed the highlighting from Mathematica (default) to C and C++. This is done by inserting <!-- language: lang-c --> (unindented) followed by a blank line, immediately followed by the code block (indented). (and <!-- language: lang-cpp --> for C++) – rm -rf – 2013-03-13T18:49:48.510

@rm ok. thank you. :) – Stefan – 2013-03-13T18:50:20.133

@Leonid Probably you know that already, but my favourite book about Lisp (well honestly it is more scheme) is 'Structure and Interpretation of Computer Programs' this is really fascinating. I do even have the full footage when the writers (Abelson, Sussman) gave a course at HP for the engineering folks. This is really a tour through computer science. You'll never have such a great introduction to Y-combinator and stuff. – Stefan – 2013-03-13T18:53:46.150

To see what rm-rf changed, click on the time right next to the word "edited" above his name. (If you didn't follow that, here's the link.)

– rcollyer – 2013-03-13T18:54:28.827

@rcollyer thank's for the hint. and rm -rf (love that :) ) thank you for even improving my poor grammar. – Stefan – 2013-03-13T18:56:47.643

I do know about the book, and it is on my list of things to go through. I did not decide yet whether I start with Scheme or Common Lisp - I actually started with CL but did not get much done yet. Currently I am swamped with work, and for this sort of studies one needs to have free time on a systematic basis, or a lot of it in one go (like a couple of free weeks at a minimum) - which so far remains a pipe dream :-) – Leonid Shifrin – 2013-03-13T18:56:50.990

@Leonid Scheme is a cleaned lisp version and supports tail recursion, so no pain...there is a series of books called 'the little schemer' these are fascinating as well...and there is .... SO MUCH TO READ!!! and no time... :( – Stefan – 2013-03-13T19:00:07.557

I have all those books (little, seasoned and reasoned schemer), and 3 or 4 books on CL as well. But, as you said, no time ... – Leonid Shifrin – 2013-03-13T19:21:36.437

@Leonid thinking about the Haskell link. This would lead probably to a, from ground up, reengineered implementation of MathLink, because this works only well for imperative languages. There is always this M is functional but try to link a functional language...that sounds to me like daunting task and leaves the question open is M really functional or just a very well engineered rule based system with some functional scent? – Stefan – 2013-03-13T19:48:53.393

MathLink Monad... – Stefan – 2013-03-13T19:49:33.033

@Stefan Actually, when I was using JLink (which is built on MathLink) for building RLink, I found the whole setup very convenient. R has a simple object model, so the core types were easy to map onto Java objects. I don't know Haskell,but I can guess that this would be no easy task. Re-engineering MathLink sounds like a pretty daunting task though. Not sure if that is really needed - it is rather low-level. But having various links (to Haskell, ML/Ocaml, Lisp, Python etc) would be very useful indeed. – Leonid Shifrin – 2013-03-13T20:28:37.350

@Stefan: when you sent me that post higher up, I didn't get notified, because I wasn't part of this conversation until just now. Your proposed Haskell link might be worth-while, but I won't write it, because I keep things with the JVM. The JVM is an incredible piece of technology, and other languages can end up being a bit esoteric. The JVM is the most widely distributed VM there is, so that's where I focus my interest. Nothing against Haskell, but I want to cater to the JVM. – Andreas Lauschke – 2013-03-13T20:41:25.067

@Leonid please believe me. I read through this discussion and realized for the first time that you've been the one who wrote that C code. Seriously. I thought this was from the guy who complained about M's poor performance. I'm sorry. Really. I totally missed that. Jesus...what's wrong with me. I'm sorry about that and since you're such a modest guy I was just expecting that you protect this virtual him from my criticism. Conclusion. I'm dumb and you again proofed to be a decent gentleman. If someone would have come to me and say this I'd explode... – Stefan – 2013-03-13T20:50:37.710

@Stefan No problem at all, really. Also, why should your opinion about the code depend on who wrote it? Besides, I would not be offended when some bad code of mine is called bad. In this case, I was just explaining that I set a different goal than maximal efficiency. – Leonid Shifrin – 2013-03-13T21:38:16.810