[Numpy-discussion] Moving NumPy's PRNG Forward

Robert Kern robert.kern at gmail.com
Fri Jan 26 19:28:54 EST 2018

On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard <kevin.k.sheppard at gmail.com>
> I am a firm believer that the current situation is not sustainable.
There are a lot of improvements that can practically be incorporated.
While many of these are performance related, there are also improvements in
accuracy over some ranges of parameters that cannot be incorporated. I also
think that perfect stream reproducibility is a bit of a myth across
versions since this really would require identical OS, compiler and
possibly CPU for some of the generators that produce floats.
> I believe there is a case for separating the random generator from core
NumPy.  Some points that favor becoming a subproject:
> 1. It is a pure consumer of NumPy API.  Other parts of the API do no
depend on random.
> 2. A stand alone package could be installed along side many different
version of core NumPy which would reduce the pressure on freezing the

Removing numpy.random (or freezing it as deprecated legacy while all PRNG
development moves elsewhere) is probably a non-starter. It's too used for
us not to provide something. That said, we can (and ought to) make it much
easier for external packages to provide PRNG capabilities (core PRNGs and
distributions) that interoperate with the core functionality that numpy
provides. I'm also happy to place a high barrier on adding more
distributions to numpy.random once that is in place.

Specifically, core uniform PRNGs should have a small common C API that
distribution functions can use. This might just be a struct with an opaque
`void*` state pointer and then 2 function pointers for drawing a uint64
(whole range) and a double in [0,1) from the state. It's important to
expose our core uniform PRNGs as a C API because there has been a desire to
interoperate at that level, using the same PRNG state inside C or Fortran
or GPU code. If that's in place, then people can write new efficient
distribution functions in C that use this small C API agnostic to the core
PRNG algorithm. It also makes it easy to implement new core PRNGs that the
distribution functions provided by numpy.random can use.

> In terms of what is needed, I think that the underlying PRNG should be
swappable.  The will provide a simple mechanism to allow certain types of
advancement while easily providing backward compat.  In the current design
this is very hard and requires compiling many nearly identical copies of
RandomState. In pseudocode something like
> standard_normal(prng)
> where prng is a basic class that retains the PRNG state and has a small
set of core random number generators that belong to the underlying PRNG --
probably something like int32, int64, double, and possibly int53. I am not
advocating explicitly passing the PRNG as an argument, but having
generators which can take any suitable PRNG would add a lot of flexibility
in terms of taking advantage of improvements in the underlying PRNGs (see,
e.g., xoroshiro128/xorshift1024).  The "small" core PRNG would have
responsibility over state and streams.  The remainder of the module would
transform the underlying PRNG into the required distributions.

(edit: after writing the following verbiage, I realize it can be summed up
with more respect to your suggestion: yes, we should do this design, but we
don't need to and shouldn't give up on a class with distribution methods.)

Once the core PRNG C API is in place, I don't think we necessarily need to
move away from a class structure per se, though it becomes an option. We
just separate the core PRNG object from the distribution-providing class.
We don't need to make copies of the distribution-providing class just to
use a new core PRNG. I'm coming around to Nathaniel's suggestion for the
constructor API (though not the distribution-versioning, for reasons I can
get into later).  We have a couple of core uniform PRNG classes like
`MT19937` and `PCG128`. Those have a tiny API, and probably don't have a
lot of unnecessary code clones between them. Their constructors can be
different depending on the different ways they can be instantiated,
depending on the PRNG's features. I'm not sure that they'll have any common
methods besides `__getstate__/__setstate__` and probably a `copy()`. They
will expose their C API as a Python-opaque attribute. They can have
whatever algorithm-dependent methods they need (e.g. to support jumpahead).
I might not even expose to Python the uint64 and U(0,1) double sampling
methods, but maybe so.

Then we have a single `Distributions` class that provides all of the
distributions that we want to support in numpy.random (i.e. what we
currently have on `RandomState` and whatever passes our higher bar in the
future). It takes one of the core PRNG instances as an argument to the
constructor (nominally, at least; we can design factory functions to make
this more convenient).

  prng = Distributions(PCG128(seed))
  x = prng.normal(mean, std)

If someone wants to write a WELL512 core PRNG, they can just implement that
object and pass it to `Distributions()`. The `Distributions` code doesn't
need to be copied, nor do we need to much around with `__new__` tricks in

Why do this instead of distribution functions? Well, I have a damn lot of
code that is expecting an object with at least the broad outlines of the
`RandomState` interface. I'm not going to update that code to use functions
instead. And if I'm not, no one is. There isn't a good transition path
since the PRNG object needs to thread through all of the code, whether it's
my code that I'm writing greenfield or library code that I don't control.
That said, people writing new distributions outside of numpy.random would
be writing functions, not trying to add to the `Distributions` class, but
that's fine.

It also allows the possibility for the `Distributions` class to be stateful
if we want to do things like caching the next Box-Muller variate and not
force that onto the core PRNG state like I currently do. Though I'd rather
just drop Box-Muller, and that's not a common pattern outside of
Box-Muller. But it's a possibility.

Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180127/c76673e3/attachment.html>

More information about the NumPy-Discussion mailing list