On 9/4/06, A. M. Archibald <peridot.faceted@gmail.com> wrote:
On 04/09/06, Robert Kern <robert.kern@gmail.com> wrote:
> Charles R Harris wrote:
> > What sort of api should this be? It occurs to me that there are already
> > 4 sources of random bytes:
> >
> > Initialization:
> >
> > /dev/random (pseudo random, I think)

/dev/random is (supposed to be) true randomness derived from various
sources. It's also fed through a bunch of hashing and CRC functions
chosen for convenience, but is probably pretty unbiased.

> > /dev/urandom
> > crypto system on windows
> >
> > Pseudo random generators:
> >
> > mtrand
> >
> > I suppose we could add some cryptologically secure source as well. That
> > indicates to me that one set of random number generators would just be
> > streams of random bytes, possibly in 4 byte chunks. If I were doing this
> > for linux these would all look like file systems, FUSE comes to mind.
> > Another set of functions would transform these into the different
> > distributions. So, how much should stay in numpy? What sort of API are
> > folks asking for?

"Cryptographically secure random number generator" means something
different to everybody, and implementing one that everyone is happy
with is going to be a colossal pain. Look at Yarrow and its design
documents for some idea of the complexities involved.

However, a random number generator based on a stream cipher is
probably going to be pretty good, if slow, so implementingone if those
can't hurt. But I think leaving the possibility open that one could
use custom sources of random bytes is a very good idea. There's no
particular need that custom ones should be fast, as they're for use
when you really want *good* random numbers.

I was thinking it might be nice to have one, just to generate seeds if nothing else. This could actually be written in python and use something like the python cryptography toolkit, no need to reinvent the wheel. It might be worth looking at that code just to see how someone else thought through the interface. It's under the Python license, so I need to know if that license is compatible with the BSD license used for numpy.

> It would be good to also support quasi-random generators like Sobol and Halton
> sequences. They tend to only generate floating point numbers in [0, 1] (or (0,
> 1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs that
> only output floating point numbers, but I doubt we want to implement any of
> them. You can look at the 1.6 version of RandomKit for an implementation of
> Sobol sequences and ISAAC, a cryptographic PRNG.
>    http://www.jeannot.org/~js/code/index.en.html

Sobol' sequences may require special care in constructing non-uniform
distributions in order to preserve their uniformity properties. If we
can simply import ISAAC, it won't hurt, but I'd rather trust (say) AES
in counter mode than some custom cryptosystem that hasn't been
analyzed as thoroughly.

> I'm thinking that the core struct that we're going to pass around will look
> something like this:
> struct random_state {
>    void* state;
>    unsigned int (*gen_uint32)(struct random_state*);
>    double (*gen_double)(struct random_state*);
> }
> state -- A pointer to the generator-specific struct.
> gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. Possibly
> NULL if the generator can not implement such a generator.
> gen_double -- A function pointer that yields a double-precision number in [0, 1]
> (possibly omitting one or both of the endpoints depending on the implementation).

Are 32-bit numbers really the right least common denominator? There
are plenty of 64-bit platforms out there...

MWC8222 runs faster (2x) on some 64 bit platforms because it multiplies two 32 bit numbers and uses the 64 bit product. The mtrand generator really uses a polynomial with coefficients in z_2 and has 624 words worth, an even number, so could probably be adapted to run directly with 64 bit registers although the gain might not be much. Most other generators I know of produce 32 bit numbers. Not that I am an expert on the subject.

Given this API, implementing a subclassable class that exports it
should satisfy most people's needs for interchangeable generators.


A. M. Archibald