On 9/3/06, Robert Kern <robert.kern@gmail.com> wrote:
Charles R Harris wrote:
> Hi Robert,
> I am about to get started on some stuff for the random number generators
> but thought I would run it by you first. I envisage the following:
> uniform short_doubles -- doubles generated from a single 32 bit random
> number (advantage: speed)
> uniform double, short_doubles on the interval (0,1) -- don't touch
> singularities in functions like log (this is my preferred default)
> fast_normal -- ziggurat method using single 32 bit random numbers
> (advantage: speed)
> fast_exponential -- ziggurat method using single 32 bit random numbers
> (advantage: speed)
> MWC8222 random number generator (advantage: speed on some machines,
> different from mtrand)
> Except for the last, none conflict with current routines and can be
> added without a branch. I expect adding MWC8222 might need more
> extensive work and I will branch for that. So the questions are of
> utility and naming. I see some utility for myself, otherwise I wouldn't
> be considering doing the work. OTOH, I already have (C++) routines that
> I use for these things, so a larger question might be if anyone else
> sees a use for these. I like speed, but it is not always that important
> in everyday apps.

I would prefer not to expand the API of numpy.random. If it weren't necessary
for numpy to provide all of the capabilities that came with Numeric's
RandomArray, I wouldn't want numpy.random in there at all.

Yes, good point.

Now, a very productive course of action would be to refactor numpy.random such
that the distributions (the first four items on your list fall into this
category) and the underlying PRNG (the fifth) are separated from one another
such that they can be mixed and matched at runtime. A byproduct of this would
expose the C API of both of these in order to be usable by other C extension
modules, something that's been asked for about a dozen times now. The five items
on your list could be implemented in an extension module distributed in scipy.

What sort of api should this be? It occurs to me that there are already 4 sources of random bytes:


/dev/random (pseudo random, I think)
crypto system on windows

Pseudo random generators:


I suppose we could add some cryptologically secure source as well. That indicates to me that one set of random number generators would just be streams of random bytes, possibly in 4 byte chunks. If I were doing this for linux these would all look like file systems, FUSE comes to mind. Another set of functions would transform these into the different distributions. So, how much should stay in numpy? What sort of API are folks asking for?

> I see that Pyrex is used for the interface, so I suppose that is one
> more tool to become familiar with ;)

Possibly not. Pyrex was getting in the way of exposing a C API the last time I
took a stab at it. A possibility that just occurred to me is to make an
extension module that *only* exposes the C API and mtrand could be rewritten to
use that API. Hmmm. I like that.

Good, I can do without pyrex.

I can give some guidance about how to proceed and help you navigate the current
code, but I'm afraid I don't have much time to actually code.

Thanks, that is all I ask.

Robert Kern