On 9/3/06, Robert Kern <robert.kern@gmail.com> wrote:
Charles R Harris wrote:
Hi Robert,
I am about to get started on some stuff for the random number generators but thought I would run it by you first. I envisage the following:
uniform short_doubles -- doubles generated from a single 32 bit random number (advantage: speed) uniform double, short_doubles on the interval (0,1) -- don't touch singularities in functions like log (this is my preferred default) fast_normal -- ziggurat method using single 32 bit random numbers (advantage: speed) fast_exponential -- ziggurat method using single 32 bit random numbers (advantage: speed) MWC8222 random number generator (advantage: speed on some machines, different from mtrand)
Except for the last, none conflict with current routines and can be added without a branch. I expect adding MWC8222 might need more extensive work and I will branch for that. So the questions are of utility and naming. I see some utility for myself, otherwise I wouldn't be considering doing the work. OTOH, I already have (C++) routines that I use for these things, so a larger question might be if anyone else sees a use for these. I like speed, but it is not always that important in everyday apps.
I would prefer not to expand the API of numpy.random. If it weren't necessary for numpy to provide all of the capabilities that came with Numeric's RandomArray, I wouldn't want numpy.random in there at all.
Yes, good point. Now, a very productive course of action would be to refactor numpy.randomsuch
that the distributions (the first four items on your list fall into this category) and the underlying PRNG (the fifth) are separated from one another such that they can be mixed and matched at runtime. A byproduct of this would expose the C API of both of these in order to be usable by other C extension modules, something that's been asked for about a dozen times now. The five items on your list could be implemented in an extension module distributed in scipy.
What sort of api should this be? It occurs to me that there are already 4 sources of random bytes: Initialization: /dev/random (pseudo random, I think) /dev/urandom crypto system on windows Pseudo random generators: mtrand I suppose we could add some cryptologically secure source as well. That indicates to me that one set of random number generators would just be streams of random bytes, possibly in 4 byte chunks. If I were doing this for linux these would all look like file systems, FUSE comes to mind. Another set of functions would transform these into the different distributions. So, how much should stay in numpy? What sort of API are folks asking for?
I see that Pyrex is used for the interface, so I suppose that is one
more tool to become familiar with ;)
Possibly not. Pyrex was getting in the way of exposing a C API the last time I took a stab at it. A possibility that just occurred to me is to make an extension module that *only* exposes the C API and mtrand could be rewritten to use that API. Hmmm. I like that.
Good, I can do without pyrex. I can give some guidance about how to proceed and help you navigate the
current code, but I'm afraid I don't have much time to actually code.
Thanks, that is all I ask. --
Robert Kern
Chuck