[Numpy-discussion] SFMT (faster mersenne twister)
Pierre-Andre Noel
noel.pierre.andre at gmail.com
Mon Sep 8 13:05:03 EDT 2014
> I think we could add new generators to NumPy though,
> perhaps with a keyword to control the algorithm (defaulting to the
current
> Mersenne Twister).
Why not do something like the C++11 <random>? In <random>, a "generator"
is the engine producing randomness, and a "distribution" decides what is
the type of outputs that you want. Here is the example on
http://www.cplusplus.com/reference/random/ .
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(1,6);
int dice_roll = distribution(generator); // generates number in
the range 1..6
For convenience, you can bind the generator with the distribution (still
from the web page above).
auto dice = std::bind(distribution, generator);
int wisdom = dice()+dice()+dice();
Here is how I propose to adapt this scheme to numpy. First, there would
be a global generator defaulting to the current implementation of
Mersene Twister. Calls to numpy's "RandomState", "seed", "get_state" and
"set_state" would affect this global generator.
All numpy functions associated to random number generation (i.e.,
everything listed on
http://docs.scipy.org/doc/numpy/reference/routines.random.html except
for "RandomState", "seed", "get_state" and "set_state") would accept the
kwarg "generator", which defaults to the global generator (by default
the current Mersene Twister).
Now there could be other generator objects: the new Mersene Twister,
some lightweight-but-worse generator, or some cryptographically-safe
random generator. Each such generator would have "RandomState", "seed",
"get_state" and "set_state" methods (except perhaps the
criptographically-safe ones). When calling a numpy function with
generator=my_generator, that function uses this generator instead the
global one. Moreover, there would be be a function, say
select_default_random_engine(generator), which changes the global
generator to a user-specified one.
> This is also why parallel random
> number generators and parallel stochastic algorithms are so hard to
> program, because the operating systems' scheduler can easily break the
> reproducibility.
What I propose would also simplify this: each thread can use its own
independently-seeded generator. Timing is no longer a problem: as long
as which-thread-does-what is not affected by the scheduler, the
execution remains deterministic.
On 09/07/2014 12:51 PM, Sturla Molden wrote:
> "James A. Bednar" <jbednar at inf.ed.ac.uk> wrote:
>
>> Please don't ever, ever break the sequence of numpy's random numbers!
>> Please! We have put a lot of effort into being able to reproduce our
>> published work exactly,
> Jup, it cannot be understated how important this is for reproducibility of
> published research. Thus from a scientific standpoint it is important that
> random numbers are not random. Some might think that it's just important
> that they are as "random as possible", but reproducibility is just as
> essential to stochastic simulations. This is also why parallel random
> number generators and parallel stochastic algorithms are so hard to
> program, because the operating systems' scheduler can easily break the
> reproducibility. I think we could add new generators to NumPy though,
> perhaps with a keyword to control the algorithm (defaulting to the current
> Mersenne Twister). A particular candidate I think we should consider is the
> DCMT, which is exceptionally good for parallel algorithms (the DCMT code is
> now BSD licensed, it used to be LGPL). Because of the way randomkit it
> written, it is very easy to plug-in different generators.
>
> Sturla
>
>
More information about the NumPy-Discussion
mailing list