[Numpy-discussion] Intel random number package
oleksandr.pavlyk at intel.com
Wed Oct 26 17:25:40 EDT 2016
Please see responses inline.
From: NumPy-Discussion [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Todd
Sent: Wednesday, October 26, 2016 4:04 PM
To: Discussion of Numerical Python <numpy-discussion at scipy.org>
Subject: Re: [Numpy-discussion] Intel random number package
On Wed, Oct 26, 2016 at 4:30 PM, Pavlyk, Oleksandr <oleksandr.pavlyk at intel.com<mailto:oleksandr.pavlyk at intel.com>> wrote:
The module under review, similarly to randomstate package, provides alternative basic pseudo-random number generators (BRNGs), like MT2203, MCG31, MRG32K3A, Wichmann-Hill. The scope of support differ, with randomstate implementing some generators absent in MKL and vice-versa.
Is there a reason that randomstate shouldn't implement those generators?
No, randomstate certainly can implement all the BRNGs implemented in MKL. It is at developer’s discretion.
Thinking about the possibility of providing the functionality of this module within the framework of randomstate, I find that randomstate implements samplers from statistical distributions as functions that take the state of the underlying BRNG, and produce a single variate, e.g.:
This design stands in a way of efficient use of MKL, which generates a whole vector of variates at a time. This can be done faster than sampling a variate at a time by using vectorized instructions. So I wrote mkl_distributions.cpp to provide functions that return a given size vector of sampled variates from each supported distribution.
I don't know a huge amount about pseudo-random number generators, but this seems superficially to be something that would benefit random number generation as a whole independently of whether MKL is used. Might it be possible to modify the numpy implementation to support this sort of vectorized approach?
I also think that adopting vectorized mindset would benefit np.random. For example, Gaussians are currently generated using Box-Muller algorithm which produces two variate at a time, so one currently needs to be saved in the random state struct itself, along with an indicator that it should be used on the next iteration. With vectorized approach one could populate the vector two elements at a time with better memory locality, resulting in better performance.
Vectorized approach has merits with or without use of MKL.
Another point already raised by Nathaniel is that for numpy's randomness ideally should provide a way to override default algorithm for sampling from a particular distribution. For example RandomState object that implements PCG may rely on default acceptance-rejection algorithm for sampling from Gamma, while the RandomState object that provides interface to MKL might want to call into MKL directly.
The approach that pyfftw uses at least for scipy, which may also work here, is that you can monkey-patch the scipy.fftpack module at runtime, replacing it with pyfftw's drop-in replacement. scipy then proceeds to use pyfftw instead of its built-in fftpack implementation. Might such an approach work here? Users can either use this alternative randomstate replacement directly, or they can replace numpy's with it at runtime and numpy will then proceed to use the alternative.
I think the monkey-patching approach will work.
RandomState was written with a view to replace numpy.random at some point in the future. It is standalone at the moment, from what I understand, only because it is still being worked on and extended.
One particularly important development is the ability to sample continuous distributions in floats, or to populate a given preallocated
buffer with random samples. These features are missing from numpy.random_intel and we thought it providing them.
As I have said earlier, another missing feature in the C-API for randomness in numpy.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion