Hi Robert, I am about to get started on some stuff for the random number generators but thought I would run it by you first. I envisage the following: uniform short_doubles -- doubles generated from a single 32 bit random number (advantage: speed) uniform double, short_doubles on the interval (0,1) -- don't touch singularities in functions like log (this is my preferred default) fast_normal -- ziggurat method using single 32 bit random numbers (advantage: speed) fast_exponential -- ziggurat method using single 32 bit random numbers (advantage: speed) MWC8222 random number generator (advantage: speed on some machines, different from mtrand) Except for the last, none conflict with current routines and can be added without a branch. I expect adding MWC8222 might need more extensive work and I will branch for that. So the questions are of utility and naming. I see some utility for myself, otherwise I wouldn't be considering doing the work. OTOH, I already have (C++) routines that I use for these things, so a larger question might be if anyone else sees a use for these. I like speed, but it is not always that important in everyday apps. I see that Pyrex is used for the interface, so I suppose that is one more tool to become familiar with ;) Chuck
On 04/09/06, Charles R Harris <charlesr.harris@gmail.com> wrote:
Except for the last, none conflict with current routines and can be added without a branch. I expect adding MWC8222 might need more extensive work and I will branch for that. So the questions are of utility and naming. I see some utility for myself, otherwise I wouldn't be considering doing the work. OTOH, I already have (C++) routines that I use for these things, so a larger question might be if anyone else sees a use for these. I like speed, but it is not always that important in everyday apps.
How painful would it be to allow users to drop in their own sources of raw random numbers? A couple of years ago I was trying to analyze some X-ray timing data and we saw some peaks at the end of a long chain of periodicity-detecting software (Fourier transforms, parameter searching, et cetera). To rule out the possibility that they were coming from the random numbers we were using at one stage, I wanted to supply raw random numbers from /dev/random. Unfortunately, that forced me to write my own program to convert uniform random numbers to exponential. Allowing the function that generates raw random bytes to be overwritten would be extremely handy, even if it were painfully slow. In the same project I also noticed it would be nice to be able to (say) do "exponential(2+sin(arange(10)))" to get an array of exponentials with varying parameters. I realize all this is outside the scope of what you were asking. I would have found another pseudorandom number generator that I could just switch to a benefit; the rest of them are less exciting. When you say "32-bit integers" do you really mean 32-bit, or do you mean "machine width"? 32-bit integers may not be faster on 64-bit platforms... A. M. Archibald
A. M. Archibald wrote:
In the same project I also noticed it would be nice to be able to (say) do "exponential(2+sin(arange(10)))" to get an array of exponentials with varying parameters.
Travis O. recently added this capability. The distribution parameters (loc, scale, alpha, whatever) are broadcast against each other using the same rules as ufunc parameters. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 04/09/06, Robert Kern <robert.kern@gmail.com> wrote:
A. M. Archibald wrote:
In the same project I also noticed it would be nice to be able to (say) do "exponential(2+sin(arange(10)))" to get an array of exponentials with varying parameters.
Travis O. recently added this capability. The distribution parameters (loc, scale, alpha, whatever) are broadcast against each other using the same rules as ufunc parameters.
Wonderful! Thank you, Travis O. A. M. Archibald
Charles R Harris wrote:
Hi Robert,
I am about to get started on some stuff for the random number generators but thought I would run it by you first. I envisage the following:
uniform short_doubles -- doubles generated from a single 32 bit random number (advantage: speed) uniform double, short_doubles on the interval (0,1) -- don't touch singularities in functions like log (this is my preferred default) fast_normal -- ziggurat method using single 32 bit random numbers (advantage: speed) fast_exponential -- ziggurat method using single 32 bit random numbers (advantage: speed) MWC8222 random number generator (advantage: speed on some machines, different from mtrand)
Except for the last, none conflict with current routines and can be added without a branch. I expect adding MWC8222 might need more extensive work and I will branch for that. So the questions are of utility and naming. I see some utility for myself, otherwise I wouldn't be considering doing the work. OTOH, I already have (C++) routines that I use for these things, so a larger question might be if anyone else sees a use for these. I like speed, but it is not always that important in everyday apps.
I would prefer not to expand the API of numpy.random. If it weren't necessary for numpy to provide all of the capabilities that came with Numeric's RandomArray, I wouldn't want numpy.random in there at all. Now, a very productive course of action would be to refactor numpy.random such that the distributions (the first four items on your list fall into this category) and the underlying PRNG (the fifth) are separated from one another such that they can be mixed and matched at runtime. A byproduct of this would expose the C API of both of these in order to be usable by other C extension modules, something that's been asked for about a dozen times now. The five items on your list could be implemented in an extension module distributed in scipy.
I see that Pyrex is used for the interface, so I suppose that is one more tool to become familiar with ;)
Possibly not. Pyrex was getting in the way of exposing a C API the last time I took a stab at it. A possibility that just occurred to me is to make an extension module that *only* exposes the C API and mtrand could be rewritten to use that API. Hmmm. I like that. I can give some guidance about how to proceed and help you navigate the current code, but I'm afraid I don't have much time to actually code. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 9/3/06, Robert Kern <robert.kern@gmail.com> wrote:
Charles R Harris wrote:
Hi Robert,
I am about to get started on some stuff for the random number generators but thought I would run it by you first. I envisage the following:
uniform short_doubles -- doubles generated from a single 32 bit random number (advantage: speed) uniform double, short_doubles on the interval (0,1) -- don't touch singularities in functions like log (this is my preferred default) fast_normal -- ziggurat method using single 32 bit random numbers (advantage: speed) fast_exponential -- ziggurat method using single 32 bit random numbers (advantage: speed) MWC8222 random number generator (advantage: speed on some machines, different from mtrand)
Except for the last, none conflict with current routines and can be added without a branch. I expect adding MWC8222 might need more extensive work and I will branch for that. So the questions are of utility and naming. I see some utility for myself, otherwise I wouldn't be considering doing the work. OTOH, I already have (C++) routines that I use for these things, so a larger question might be if anyone else sees a use for these. I like speed, but it is not always that important in everyday apps.
I would prefer not to expand the API of numpy.random. If it weren't necessary for numpy to provide all of the capabilities that came with Numeric's RandomArray, I wouldn't want numpy.random in there at all.
Yes, good point. Now, a very productive course of action would be to refactor numpy.randomsuch
that the distributions (the first four items on your list fall into this category) and the underlying PRNG (the fifth) are separated from one another such that they can be mixed and matched at runtime. A byproduct of this would expose the C API of both of these in order to be usable by other C extension modules, something that's been asked for about a dozen times now. The five items on your list could be implemented in an extension module distributed in scipy.
What sort of api should this be? It occurs to me that there are already 4 sources of random bytes: Initialization: /dev/random (pseudo random, I think) /dev/urandom crypto system on windows Pseudo random generators: mtrand I suppose we could add some cryptologically secure source as well. That indicates to me that one set of random number generators would just be streams of random bytes, possibly in 4 byte chunks. If I were doing this for linux these would all look like file systems, FUSE comes to mind. Another set of functions would transform these into the different distributions. So, how much should stay in numpy? What sort of API are folks asking for?
I see that Pyrex is used for the interface, so I suppose that is one
more tool to become familiar with ;)
Possibly not. Pyrex was getting in the way of exposing a C API the last time I took a stab at it. A possibility that just occurred to me is to make an extension module that *only* exposes the C API and mtrand could be rewritten to use that API. Hmmm. I like that.
Good, I can do without pyrex. I can give some guidance about how to proceed and help you navigate the
current code, but I'm afraid I don't have much time to actually code.
Thanks, that is all I ask. --
Robert Kern
Chuck
Charles R Harris wrote:
What sort of api should this be? It occurs to me that there are already 4 sources of random bytes:
Initialization:
/dev/random (pseudo random, I think) /dev/urandom crypto system on windows
Pseudo random generators:
mtrand
I suppose we could add some cryptologically secure source as well. That indicates to me that one set of random number generators would just be streams of random bytes, possibly in 4 byte chunks. If I were doing this for linux these would all look like file systems, FUSE comes to mind. Another set of functions would transform these into the different distributions. So, how much should stay in numpy? What sort of API are folks asking for?
It would be good to also support quasi-random generators like Sobol and Halton sequences. They tend to only generate floating point numbers in [0, 1] (or (0, 1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs that only output floating point numbers, but I doubt we want to implement any of them. You can look at the 1.6 version of RandomKit for an implementation of Sobol sequences and ISAAC, a cryptographic PRNG. http://www.jeannot.org/~js/code/index.en.html I'm thinking that the core struct that we're going to pass around will look something like this: struct random_state { void* state; unsigned int (*gen_uint32)(struct random_state*); double (*gen_double)(struct random_state*); } state -- A pointer to the generator-specific struct. gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. Possibly NULL if the generator can not implement such a generator. gen_double -- A function pointer that yields a double-precision number in [0, 1] (possibly omitting one or both of the endpoints depending on the implementation). Possibly this struct needs to be expanded to include function pointers for destroying the state struct and for a copying the state to a new object. The seeding function(s) probably don't need to travel in the struct. We should determine if C++ will work for us, here. Unfortunately, that will force all extension modules that want to use this API to be C++ modules. One other feature of the current implementation of state struct is that a few of the distributions cache information between calls. Notably, the Box-Müller Gaussian distribution algorithm generates two normal deviates at a time; one is returned, the other is cached until the next call. The binomial distribution computes some intermediate values that are reused if the next call has the same parameters. We should probably store such things separately from the core state struct this time. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 04/09/06, Robert Kern <robert.kern@gmail.com> wrote:
Charles R Harris wrote:
What sort of api should this be? It occurs to me that there are already 4 sources of random bytes:
Initialization:
/dev/random (pseudo random, I think)
/dev/random is (supposed to be) true randomness derived from various sources. It's also fed through a bunch of hashing and CRC functions chosen for convenience, but is probably pretty unbiased.
/dev/urandom crypto system on windows
Pseudo random generators:
mtrand
I suppose we could add some cryptologically secure source as well. That indicates to me that one set of random number generators would just be streams of random bytes, possibly in 4 byte chunks. If I were doing this for linux these would all look like file systems, FUSE comes to mind. Another set of functions would transform these into the different distributions. So, how much should stay in numpy? What sort of API are folks asking for?
"Cryptographically secure random number generator" means something different to everybody, and implementing one that everyone is happy with is going to be a colossal pain. Look at Yarrow and its design documents for some idea of the complexities involved. However, a random number generator based on a stream cipher is probably going to be pretty good, if slow, so implementingone if those can't hurt. But I think leaving the possibility open that one could use custom sources of random bytes is a very good idea. There's no particular need that custom ones should be fast, as they're for use when you really want *good* random numbers.
It would be good to also support quasi-random generators like Sobol and Halton sequences. They tend to only generate floating point numbers in [0, 1] (or (0, 1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs that only output floating point numbers, but I doubt we want to implement any of them. You can look at the 1.6 version of RandomKit for an implementation of Sobol sequences and ISAAC, a cryptographic PRNG.
Sobol' sequences may require special care in constructing non-uniform distributions in order to preserve their uniformity properties. If we can simply import ISAAC, it won't hurt, but I'd rather trust (say) AES in counter mode than some custom cryptosystem that hasn't been analyzed as thoroughly.
I'm thinking that the core struct that we're going to pass around will look something like this:
struct random_state { void* state; unsigned int (*gen_uint32)(struct random_state*); double (*gen_double)(struct random_state*); }
state -- A pointer to the generator-specific struct.
gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. Possibly NULL if the generator can not implement such a generator.
gen_double -- A function pointer that yields a double-precision number in [0, 1] (possibly omitting one or both of the endpoints depending on the implementation).
Are 32-bit numbers really the right least common denominator? There are plenty of 64-bit platforms out there... Given this API, implementing a subclassable class that exports it should satisfy most people's needs for interchangeable generators. A. M. Archibald
On 9/4/06, A. M. Archibald <peridot.faceted@gmail.com> wrote:
On 04/09/06, Robert Kern <robert.kern@gmail.com> wrote:
Charles R Harris wrote:
What sort of api should this be? It occurs to me that there are already 4 sources of random bytes:
Initialization:
/dev/random (pseudo random, I think)
/dev/random is (supposed to be) true randomness derived from various sources. It's also fed through a bunch of hashing and CRC functions chosen for convenience, but is probably pretty unbiased.
/dev/urandom crypto system on windows
Pseudo random generators:
mtrand
I suppose we could add some cryptologically secure source as well. That indicates to me that one set of random number generators would just be streams of random bytes, possibly in 4 byte chunks. If I were doing this for linux these would all look like file systems, FUSE comes to mind. Another set of functions would transform these into the different distributions. So, how much should stay in numpy? What sort of API are folks asking for?
"Cryptographically secure random number generator" means something different to everybody, and implementing one that everyone is happy with is going to be a colossal pain. Look at Yarrow and its design documents for some idea of the complexities involved.
However, a random number generator based on a stream cipher is probably going to be pretty good, if slow, so implementingone if those can't hurt. But I think leaving the possibility open that one could use custom sources of random bytes is a very good idea. There's no particular need that custom ones should be fast, as they're for use when you really want *good* random numbers.
I was thinking it might be nice to have one, just to generate seeds if nothing else. This could actually be written in python and use something like the python cryptography toolkit<http://www.amk.ca/python/writing/pycrypt/pycrypt.html>, no need to reinvent the wheel. It might be worth looking at that code just to see how someone else thought through the interface. It's under the Python license, so I need to know if that license is compatible with the BSD license used for numpy.
It would be good to also support quasi-random generators like Sobol and Halton
sequences. They tend to only generate floating point numbers in [0, 1] (or (0, 1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs that only output floating point numbers, but I doubt we want to implement any of them. You can look at the 1.6 version of RandomKit for an implementation of Sobol sequences and ISAAC, a cryptographic PRNG.
Sobol' sequences may require special care in constructing non-uniform distributions in order to preserve their uniformity properties. If we can simply import ISAAC, it won't hurt, but I'd rather trust (say) AES in counter mode than some custom cryptosystem that hasn't been analyzed as thoroughly.
I'm thinking that the core struct that we're going to pass around will look something like this:
struct random_state { void* state; unsigned int (*gen_uint32)(struct random_state*); double (*gen_double)(struct random_state*); }
state -- A pointer to the generator-specific struct.
gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. Possibly NULL if the generator can not implement such a generator.
gen_double -- A function pointer that yields a double-precision number in [0, 1] (possibly omitting one or both of the endpoints depending on the implementation).
Are 32-bit numbers really the right least common denominator? There are plenty of 64-bit platforms out there...
MWC8222 runs faster (2x) on some 64 bit platforms because it multiplies two 32 bit numbers and uses the 64 bit product. The mtrand generator really uses a polynomial with coefficients in z_2 and has 624 words worth, an even number, so could probably be adapted to run directly with 64 bit registers although the gain might not be much. Most other generators I know of produce 32 bit numbers. Not that I am an expert on the subject. Given this API, implementing a subclassable class that exports it
should satisfy most people's needs for interchangeable generators.
A. M. Archibald Chuck
On 04/09/06, Charles R Harris <charlesr.harris@gmail.com> wrote:
On 9/4/06, A. M. Archibald <peridot.faceted@gmail.com> wrote:
On 04/09/06, Robert Kern <robert.kern@gmail.com> wrote:
Charles R Harris wrote:
However, a random number generator based on a stream cipher is probably going to be pretty good, if slow, so implementingone if those can't hurt. But I think leaving the possibility open that one could use custom sources of random bytes is a very good idea. There's no particular need that custom ones should be fast, as they're for use when you really want *good* random numbers.
I was thinking it might be nice to have one, just to generate seeds if nothing else. This could actually be written in python and use something like the python cryptography toolkit, no need to reinvent the wheel. It might be worth looking at that code just to see how someone else thought through the interface. It's under the Python license, so I need to know if that license is compatible with the BSD license used for numpy.
Of the generators they have, only their random pool is of any use for seeding, and even that needs some clever feeding of random data. However, on practically any platofrm this will run on, random bytes based on real entropy can be extracted from the system (/dev/random, windows crypto API, something on the Mac); a uniform API for those would be a handy thing to have. But I can't recommend spending much effort connecting the python crypto stuff to numpy's random number generators; it's the sort of thing application writers can easily cook up by subclassing RandomGenerator (or whatever). A. M. Archibald
participants (3)
-
A. M. Archibald
-
Charles R Harris
-
Robert Kern