23

12

Performance on Random Number generation is intolerable. Whenever you need many deviates from the same distribution but with different parameters it takes forever.

Here is a thread on Poisson deviates being slow. The comments and answers are all aimed at Poisson Deviates and do not address the fundamental shortcoming Mathematica has generating random numbers.

Here I asked why MultiNomial Deviates were so slow, and didn't get an answer, although one person suggested there was no solution.

Elsewhere I asked how to generate BinomialDeviates and got a helpful Binomial-specific answer.

The problem is I don't want to spend hours trying to work around Mathematica's intolerable performance each time I need some deviates from a standard distribution.

This is essentially a suggestion that WRI needs to add function calls for people who need large numbers of random deviates in which all the parameters are different. Below I show the syntax for such a call for binomial variates in R, Octave, and scipy.

R:

```
num<-10^5
n<-sample(1:10,num,TRUE)
p<-runif(num)
ptm <- proc.time()
out=rbinom(num,n,p)
proc.time()-ptm
```

Here is the same thing in Octave:

```
num=10^7;
n=randi([1,10],num,1);
p=rand(num,1);
t=cputime;
out=binornd(n,p);
cputime-t
```

and scipy:

```
num=10**7
n=stats.randint.rvs(1,10,size=num)
p=stats.uniform.rvs(0,1,num)
out=stats.binom.rvs(n,p)
```

NOTE: The point is not to generate the distribution of deviates that are obtained for
uniformly distributed `p`

and `n`

. The point is to generate a bunch of deviates when each deviate is drawn for a different parameter value. The manner in which the parameters are obtained should be considered as unknown. I used uniform for `p`

and `n`

because they had to be set to *something*, not because I am interested in the resulting distribution.

As you can see the calls in R, Octave, aand Python (scipy) where the actual work desired is performed is a single line of self-evident code in each case. The performance of all these languages blows Mathematica away. Mathematica has `Solve`

and `NSolve`

, `DSolve`

& `NDSolve`

, `Integrate`

& `NIntegrate`

, `Sum`

and `NSum`

, `Product`

& `NProduct`

... Something similar (to the numerical versions) should be implemented for random number generation.

Below there are 3 answers. blochwave's solution has the best performance and makes calls to C++ routines which the user must write. The routine poissonvariate.cpp has 48 lines. In addition there are two lines of Mathematica which must be written as well. The 48 lines of cpp could obviously be copied from the working poissonvariate and then morphed into the code needed for the desired generator but one must still perform substantially more work to implement this solution for new generators than one must in R, Octave, and python. My solution was to use Rlink and generate the numbers in R. THis gave a performance improvement of about 100 over native Mathematica which for my purposes was adequate. The amount of code required to implement the R solution is 2 + n where n is the number of distinct random variate types you need. (BY "type" I mean, exponential, binomial, poisson, multinomial, beta, chisquare, etc. etc. etc.) . I believe that most users value both code simplicity and performance. The 3rd answer did not provide a solution to the problem that was posed.

Below I compare computing 10,000 random deviates in Mathematica against 10,000,000 in C. I draw from 5 distributions in Mathematica and C: Binomial, Exponential, Chi-Squared, Poisson and Gamma. In C I generated about 50 million deviates in about 10 seconds. In Mathematica I generated about 50,000 deviates in about 8 seconds. If I generate 10,000 deviates of each distribution in C the code runs in about 0.03 seconds which is roughly 250 times faster than Mathematica. Is there really no way to improve this?

Here is the comparison between Mathematica and C. The upshot is that there is no comparison. C is about 100-1000 times faster.

```
num = 10000
ints = RandomInteger[{1, 10}, num];
reals = RandomReal[{0, 1}, num];
mus = RandomReal[{0, 1}, num];
mf = Thread[BinomialDistribution[ints, reals]];
gf = Thread[GammaDistribution[mus, reals]];
RandomVariate /@ mf; // Timing
RandomVariate /@ gf; // Timing
RandomVariate[PoissonDistribution[#]] & /@ mus; // Timing
RandomVariate[ChiSquareDistribution[#]] & /@ ints; // Timing
RandomVariate[ExponentialDistribution[#]] & /@ mus; // Timing
```

The 10,000 Binomial deviates take 6 seconds (on a MacBook Air) to generate. The Poisson deviates take over 1 second.

The C code below uses the library `ranlib`

, which is documented and virtually bullet-proof as far as I know. You can get `ranlib.c`

here - it's the 7th library down.

To run the code below save it to a file called `ran_test.c`

. Get the source code from the library and put `linpack.c`

, `ranlib.c`

, `com.c`

and `ranlib.h`

into the same directory. On a Unix machine (e.g. Linux or Mac) that has GCC installed type the following to compile.

```
gcc -O2 ran_test.c linpack.c ranlib.c com.c -o tst
```

To time and execute the code enter: `time ./tst`

on the command line and hit Return. On my MacBook Air the C generates 10 million deviates from each of the following distributions: Gamma, Binomial, Chi-Squared, Exponential and Poisson. The parameters for the distributions are different on each call. The code runs in about 10 seconds. Thus it is roughly 1,000 times faster than Mathematica.

```
#include <stdio.h>
#include <stdlib.h>
#include "ranlib.h"
#define MILLION 1000000
#define TEN_MILLION 10000000
main(){
long int i,j,k,dev,count, ix,n;
long iseed1=100,iseed2=1000;
long int ncat, rint;
float p, x,pp;
float mu;
setall(iseed1,iseed2);
rint = ignuin(1,TEN_MILLION);
printf("random int=%ld\n",rint);
for(k=0;k<TEN_MILLION;k++){
p=genunf(0,1);
ncat=ignuin(1,10);
n=ignbin(ncat, p);
mu=genunf(0,10000);
x=gengam(genunf(0,100),genunf(0,10000));
/* print a few of the deviates */
if(k%rint==0)printf("gamma deviate=%g\n",x);
if(k%rint==0)printf("binomial deviate=%ld\n",n);
n=ignpoi(mu);
if(k%rint==0)printf("poisson variate n=%ld\n",n);
p=genexp(mu);
if(k%rint==0)printf("exp variate =%g\n",p);
p=genchi(mu);
if(k%rint==0)printf("chi square =%g\n",p);
if(k%rint==0)printf("\n");
}
}
```

11What's your point? Of course compiled C is faster than

Mathematica. So call your C from it if that kind of speed is critical for this simplistic stuff. On the other hand, I'll wait patiently while you use your C code to evaluate complex probabilistic equations symbolically... – ciao – 2015-02-18T06:58:17.4171I don't get the point either. If you want to do a comparison, get it down to 2 lines of code. Do you really need to define

`num, ints, reals, mus, f, mf, rvs, gf`

... just to do a timing test? Talking of efficient, who has time to look through all that? – wolfies – 2015-02-18T10:57:49.5871I was not aware that symbolic calculations have anything to do with random number generation? ?? – JEP – 2015-02-18T12:53:08.307

f should not have been there. Other than that ints, reals, and mus were all used as arguments to the distributions.

I am detecting hostility here which I do not understand. I don't understand the point of comments that don't address the question.

My point is that if you can generate random variates all at once the performance is comparable to C so it is false that "of course compiled C is faster than Mathematica." There are many simulation problems in which random numbers with different parameters are needed as in the examples I provided. – JEP – 2015-02-18T13:04:32.223

8Yes - but your question is messy. A good question simplifies the problem down to 1 or 2 lines (if possible), and it is certainly possible to do so here. By contrast, your question appears lazy, because it just dumps lines and lines of code from whatever you were doing, without honing it down to the heart of the matter. 2 lines: no more. Further, your title asks if it can be FIXED, which infers that it is broken. But your question is not about quality, nor do you compare the quality of pseudorandom drawings generated, nor does your post suggest anything is broken. – wolfies – 2015-02-18T15:13:26.533

3If you feel hostility I think that might have to do with the fact that you chose to use a quite aggressive title, which also isn't in line with your actual question. You might realize that many readers would be seriously concerned about broken random number generators while they are neither surprised nor worried to hear that Mathematica is slow for this kind of task. They might (as I did) feel being tricked into reading your (lengthy) question in the first place... – Albert Retey – 2015-02-20T15:50:56.817

1Sorry about that. I wasn't trying to trick anyone into reading anything. The first line in my post is: "Performance on Random Number generation is intolerable." I'm not sure why you kept reading if you don't care about performance. WRI can fix it or not. – JEP – 2015-02-20T17:10:50.620

Seeing as intelligent use of

Mathematicaprovided faster generation than the C and R examples below, I don't think it's WRI that needs to do any fixing. Perhaps the Experimental`AngerManagement package needs a spin... – ciao – 2015-02-20T19:27:39.293@AlbertRetey well said. – ciao – 2015-02-20T19:28:59.517

@rasher out of interest, what was your code for the parametric generation of 10^6 binomial variates? 200ms is close to the performance of the C++ code (150ms) – dr.blochwave – 2015-02-20T21:47:55.223

2@blochwave: That 200ms was on a netbook, 6x faster than the R example on same - but it's basic MMA thinking: Use ParametricMixture to get dist, PDF (or CDF depending on dist type), use PDF to drive RandomChoice (or CDF for inversion). Will add binomial example to my answer when time permits. – ciao – 2015-02-20T22:00:11.203

@rasher look forward to seeing it, especially if it beats the C++! – dr.blochwave – 2015-02-20T22:01:42.453

@blochwave: Added... – ciao – 2015-02-20T22:12:47.340

If you are assuming a distribution for the input parameters on the parametric generation then you are not generating the random numbers that I need. – JEP – 2015-02-20T22:23:05.647

@JEP why not? The example provided by rasher below gives the same histogram as MMA or C++, and is very quick to generate and draw from (especially for very large draws of $>10^{7}$) – dr.blochwave – 2015-02-20T22:31:20.310

My input parameters come from a simulation. They don't have a known parametric distribution. – JEP – 2015-02-20T22:56:23.970

Ok, I understand now, thanks for clarifiying – dr.blochwave – 2015-02-20T23:07:07.727

1@jep Then you use the same technique where the parameter RVs are defined by your inputs. Pretty basic stuff. – ciao – 2015-02-20T23:45:02.397