On 04/09/06, Robert Kern <robert.kern@gmail.com> wrote:
Charles R Harris wrote:
What sort of api should this be? It occurs to me that there are already 4 sources of random bytes:
Initialization:
/dev/random (pseudo random, I think)
/dev/random is (supposed to be) true randomness derived from various sources. It's also fed through a bunch of hashing and CRC functions chosen for convenience, but is probably pretty unbiased.
/dev/urandom crypto system on windows
Pseudo random generators:
mtrand
I suppose we could add some cryptologically secure source as well. That indicates to me that one set of random number generators would just be streams of random bytes, possibly in 4 byte chunks. If I were doing this for linux these would all look like file systems, FUSE comes to mind. Another set of functions would transform these into the different distributions. So, how much should stay in numpy? What sort of API are folks asking for?
"Cryptographically secure random number generator" means something different to everybody, and implementing one that everyone is happy with is going to be a colossal pain. Look at Yarrow and its design documents for some idea of the complexities involved. However, a random number generator based on a stream cipher is probably going to be pretty good, if slow, so implementingone if those can't hurt. But I think leaving the possibility open that one could use custom sources of random bytes is a very good idea. There's no particular need that custom ones should be fast, as they're for use when you really want *good* random numbers.
It would be good to also support quasi-random generators like Sobol and Halton sequences. They tend to only generate floating point numbers in [0, 1] (or (0, 1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs that only output floating point numbers, but I doubt we want to implement any of them. You can look at the 1.6 version of RandomKit for an implementation of Sobol sequences and ISAAC, a cryptographic PRNG.
Sobol' sequences may require special care in constructing non-uniform distributions in order to preserve their uniformity properties. If we can simply import ISAAC, it won't hurt, but I'd rather trust (say) AES in counter mode than some custom cryptosystem that hasn't been analyzed as thoroughly.
I'm thinking that the core struct that we're going to pass around will look something like this:
struct random_state { void* state; unsigned int (*gen_uint32)(struct random_state*); double (*gen_double)(struct random_state*); }
state -- A pointer to the generator-specific struct.
gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. Possibly NULL if the generator can not implement such a generator.
gen_double -- A function pointer that yields a double-precision number in [0, 1] (possibly omitting one or both of the endpoints depending on the implementation).
Are 32-bit numbers really the right least common denominator? There are plenty of 64-bit platforms out there... Given this API, implementing a subclassable class that exports it should satisfy most people's needs for interchangeable generators. A. M. Archibald