[Numpy-discussion] SFMT (faster mersenne twister)

Mon Sep 8 13:05:03 EDT 2014

 > I think we could add new generators to NumPy though,
 > perhaps with a keyword to control the algorithm (defaulting to the 
current
 > Mersenne Twister).

Why not do something like the C++11 <random>? In <random>, a "generator" 
is the engine producing randomness, and a "distribution" decides what is 
the type of outputs that you want. Here is the example on 
http://www.cplusplus.com/reference/random/ .

     std::default_random_engine generator;
     std::uniform_int_distribution<int> distribution(1,6);
     int dice_roll = distribution(generator);  // generates number in 
the range 1..6

For convenience, you can bind the generator with the distribution (still 
from the web page above).

     auto dice = std::bind(distribution, generator);
     int wisdom = dice()+dice()+dice();

Here is how I propose to adapt this scheme to numpy. First, there would 
be a global generator defaulting to the current implementation of 
Mersene Twister. Calls to numpy's "RandomState", "seed", "get_state" and 
"set_state" would affect this global generator.

All numpy functions associated to random number generation (i.e., 
everything listed on 
http://docs.scipy.org/doc/numpy/reference/routines.random.html except 
for "RandomState", "seed", "get_state" and "set_state") would accept the 
kwarg "generator", which defaults to the global generator (by default 
the current Mersene Twister).

Now there could be other generator objects: the new Mersene Twister, 
some lightweight-but-worse generator, or some cryptographically-safe 
random generator. Each such generator would have "RandomState", "seed", 
"get_state" and "set_state" methods (except perhaps the 
criptographically-safe ones). When calling a numpy function with 
generator=my_generator, that function uses this generator instead the 
global one. Moreover, there would be be a function, say 
select_default_random_engine(generator), which changes the global 
generator to a user-specified one.

 > This is also why parallel random
 > number generators and parallel stochastic algorithms are so hard to
 > program, because the operating systems' scheduler can easily break the
 > reproducibility.

What I propose would also simplify this: each thread can use its own 
independently-seeded generator. Timing is no longer a problem: as long 
as which-thread-does-what is not affected by the scheduler, the 
execution remains deterministic.

On 09/07/2014 12:51 PM, Sturla Molden wrote:
> "James A. Bednar" <jbednar at inf.ed.ac.uk> wrote:
>
>> Please don't ever, ever break the sequence of numpy's random numbers!
>> Please!  We have put a lot of effort into being able to reproduce our
>> published work exactly,
> Jup, it cannot be understated how important this is for reproducibility of
> published research. Thus from a scientific standpoint it is important that
> random numbers are not random. Some might think that it's just important
> that they are as "random as possible", but reproducibility is just as
> essential to stochastic simulations. This is also why parallel random
> number generators and parallel stochastic algorithms are so hard to
> program, because the operating systems' scheduler can easily break the
> reproducibility. I think we could add new generators to NumPy though,
> perhaps with a keyword to control the algorithm (defaulting to the current
> Mersenne Twister). A particular candidate I think we should consider is the
> DCMT, which is exceptionally good for parallel algorithms (the DCMT code is
> now BSD licensed, it used to be LGPL). Because of the way randomkit it
> written, it is very easy to plug-in different generators.
>
> Sturla
>
>