[Numpy-discussion] Design feedback solicitation

Thu Jul 14 21:53:19 EDT 2016

Hi Robert,

Thank you for the pointers.

I think numpy.random should have a mechanism to choose between methods for generating the underlying randomness dynamically, at a run-time, as well as an extensible framework, where developers could add more methods. The default would be MT19937 for backwards compatibility. It is important to be able to do this at a run-time, as it would allow one to use different algorithms in different threads (like different members of the parallel Mersenne twister family of generators, see MT2203).

The framework should allow to define randomness as a bit stream, a stream of fixed size integers, or a stream of uniform reals (32 or 64 bits). This is a lot of like MKL’s abstract method for basic pseudo-random number generation.

https://software.intel.com/en-us/node/590373

Each method should provide routines to sample from uniform distributions over reals (in floats and doubles), as well as over integers.

All remaining non-uniform distributions build on top of these uniform streams.

I think it is pretty important to refactor numpy.random to allow the underlying generators to produce a given number of independent variates at a time. There could be convenience wrapper functions to allow to get one variate for backwards compatibility, but this change in design would allow for better efficiency, as sampling a vector of random variates at once is often faster than repeated sampling of one at a time due to set-up cost, vectorization, etc.

Finally, methods to sample particular distribution should uniformly support method keyword argument. Because method names vary from distribution to distribution, it should ideally be programmatically discoverable which methods are supported for a given distribution. For instance, the standard normal distribution could support method=’Inversion’, method=’Box-Muller’, method=’Ziggurat’, method=’Box-Muller-Marsaglia’ (the one used in numpy.random right now), as well as bunch of non-named methods based on transformed rejection method (see http://statistik.wu-wien.ac.at/anuran/ )

It would also be good if one could dynamically register a new method to sample from a non-uniform distribution. This would allow, for instance, to automatically add methods to sample certain non-uniform distribution by directly calling into MKL (or other library), when available, instead of building them from uniforms (which may remain a fall-through method).

The linked project is a good start, but the choice of the underlying algorithm needs to be made at a run-time,
as far as I understood, and the only provided interface to query random variates is one at a time, just like it is currently the case
in numpy.random.

Oleksandr

From: NumPy-Discussion [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Robert Kern
Sent: Friday, June 17, 2016 10:23 AM
To: Discussion of Numerical Python <numpy-discussion at scipy.org>
Subject: Re: [Numpy-discussion] Design feedback solicitation

On Fri, Jun 17, 2016 at 4:08 PM, Pavlyk, Oleksandr <oleksandr.pavlyk at intel.com<mailto:oleksandr.pavlyk at intel.com>> wrote:
>
> Hi,
>
> I am new to this list, so I will start with an introduction. My name is Oleksandr Pavlyk. I now work at Intel Corp. on the Intel Distribution for Python, and previously worked at Wolfram Research for 12 years. My latest project was to write a mirror to numpy.random, named numpy.random_intel. The module uses MKL to sample from different distributions for efficiency. It provides support for different underlying algorithms for basic pseudo-random number generation, i.e. in addition to MT19937, it also provides SFMT19937, MT2203, etc.
>
> I recently published a blog about it:
>
>        https://software.intel.com/en-us/blogs/2016/06/15/faster-random-number-generation-in-intel-distribution-for-python
>
> I originally attempted to simply replace numpy.random in the Intel Distribution for Python with the new module, but due to fixed seed backwards incompatibility this results in numerous test failures in numpy, scipy, pandas and other modules.
>
> Unlike numpy.random, the new module generates a vector of random numbers at a time, which can be done faster than repeatedly generating the same number of variates one at a time.
>
> The source code for the new module is not upstreamed yet, and this email is meant to solicit early community feedback to allow for faster acceptance of the proposed changes.

Cool! You can find pertinent discussion here:

  https://github.com/numpy/numpy/issues/6967

And the current effort for adding new core PRNGs here:

  https://github.com/bashtage/ng-numpy-randomstate

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160715/e91338d2/attachment.html>