[Numpy-discussion] Moving NumPy's PRNG Forward

Kevin Sheppard kevin.k.sheppard at gmail.com
Mon Jan 29 16:39:32 EST 2018


I agree with pretty much everything you wrote Robert.  I didn't have quote
the right frame but the generic class that takes a low-level core PRNG
sounds like the right design, and this should make user-generated
distributions easier to develop.  I was thinking along these lines inspired
by the SpiPy changes that use a LowLevelCallable, e.g.,

https://ilovesymposia.com/2017/03/12/scipys-new-lowlevelcallable-is-a-game-changer/


This might also allow users to extend the core PRNGs using something like
Numba JIT classes as an alternative.

Another area that needs though is how to correctly spawn in Multiprocess
application. This might be most easily addressed by providing a guide on a
good way rather than the arbitrary way used now.

Kevin





On Sat, Jan 27, 2018 at 5:03 PM <numpy-discussion-request at python.org> wrote:

> Send NumPy-Discussion mailing list submissions to
>         numpy-discussion at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>         numpy-discussion-request at python.org
>
> You can reach the person managing the list at
>         numpy-discussion-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
>    1. Re: Moving NumPy's PRNG Forward (Robert Kern)
>    2. Re: Using np.frombuffer and cffi.buffer on array of C structs
>       (problem with struct member padding) (Joe)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 27 Jan 2018 09:28:54 +0900
> From: Robert Kern <robert.kern at gmail.com>
> To: Discussion of Numerical Python <numpy-discussion at python.org>
> Subject: Re: [Numpy-discussion] Moving NumPy's PRNG Forward
> Message-ID:
>         <CAF6FJitFLv3U7gHkYBCFW69A5BbVe=HzBAm5oxYVcXGbBdMU=
> w at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard <
> kevin.k.sheppard at gmail.com>
> wrote:
> >
> > I am a firm believer that the current situation is not sustainable.
> There are a lot of improvements that can practically be incorporated.
> While many of these are performance related, there are also improvements in
> accuracy over some ranges of parameters that cannot be incorporated. I also
> think that perfect stream reproducibility is a bit of a myth across
> versions since this really would require identical OS, compiler and
> possibly CPU for some of the generators that produce floats.
> >
> > I believe there is a case for separating the random generator from core
> NumPy.  Some points that favor becoming a subproject:
> >
> > 1. It is a pure consumer of NumPy API.  Other parts of the API do no
> depend on random.
> > 2. A stand alone package could be installed along side many different
> version of core NumPy which would reduce the pressure on freezing the
> stream.
>
> Removing numpy.random (or freezing it as deprecated legacy while all PRNG
> development moves elsewhere) is probably a non-starter. It's too used for
> us not to provide something. That said, we can (and ought to) make it much
> easier for external packages to provide PRNG capabilities (core PRNGs and
> distributions) that interoperate with the core functionality that numpy
> provides. I'm also happy to place a high barrier on adding more
> distributions to numpy.random once that is in place.
>
> Specifically, core uniform PRNGs should have a small common C API that
> distribution functions can use. This might just be a struct with an opaque
> `void*` state pointer and then 2 function pointers for drawing a uint64
> (whole range) and a double in [0,1) from the state. It's important to
> expose our core uniform PRNGs as a C API because there has been a desire to
> interoperate at that level, using the same PRNG state inside C or Fortran
> or GPU code. If that's in place, then people can write new efficient
> distribution functions in C that use this small C API agnostic to the core
> PRNG algorithm. It also makes it easy to implement new core PRNGs that the
> distribution functions provided by numpy.random can use.
>
> > In terms of what is needed, I think that the underlying PRNG should be
> swappable.  The will provide a simple mechanism to allow certain types of
> advancement while easily providing backward compat.  In the current design
> this is very hard and requires compiling many nearly identical copies of
> RandomState. In pseudocode something like
> >
> > standard_normal(prng)
> >
> > where prng is a basic class that retains the PRNG state and has a small
> set of core random number generators that belong to the underlying PRNG --
> probably something like int32, int64, double, and possibly int53. I am not
> advocating explicitly passing the PRNG as an argument, but having
> generators which can take any suitable PRNG would add a lot of flexibility
> in terms of taking advantage of improvements in the underlying PRNGs (see,
> e.g., xoroshiro128/xorshift1024).  The "small" core PRNG would have
> responsibility over state and streams.  The remainder of the module would
> transform the underlying PRNG into the required distributions.
>
> (edit: after writing the following verbiage, I realize it can be summed up
> with more respect to your suggestion: yes, we should do this design, but we
> don't need to and shouldn't give up on a class with distribution methods.)
>
> Once the core PRNG C API is in place, I don't think we necessarily need to
> move away from a class structure per se, though it becomes an option. We
> just separate the core PRNG object from the distribution-providing class.
> We don't need to make copies of the distribution-providing class just to
> use a new core PRNG. I'm coming around to Nathaniel's suggestion for the
> constructor API (though not the distribution-versioning, for reasons I can
> get into later).  We have a couple of core uniform PRNG classes like
> `MT19937` and `PCG128`. Those have a tiny API, and probably don't have a
> lot of unnecessary code clones between them. Their constructors can be
> different depending on the different ways they can be instantiated,
> depending on the PRNG's features. I'm not sure that they'll have any common
> methods besides `__getstate__/__setstate__` and probably a `copy()`. They
> will expose their C API as a Python-opaque attribute. They can have
> whatever algorithm-dependent methods they need (e.g. to support jumpahead).
> I might not even expose to Python the uint64 and U(0,1) double sampling
> methods, but maybe so.
>
> Then we have a single `Distributions` class that provides all of the
> distributions that we want to support in numpy.random (i.e. what we
> currently have on `RandomState` and whatever passes our higher bar in the
> future). It takes one of the core PRNG instances as an argument to the
> constructor (nominally, at least; we can design factory functions to make
> this more convenient).
>
>   prng = Distributions(PCG128(seed))
>   x = prng.normal(mean, std)
>
> If someone wants to write a WELL512 core PRNG, they can just implement that
> object and pass it to `Distributions()`. The `Distributions` code doesn't
> need to be copied, nor do we need to much around with `__new__` tricks in
> Cython.
>
> Why do this instead of distribution functions? Well, I have a damn lot of
> code that is expecting an object with at least the broad outlines of the
> `RandomState` interface. I'm not going to update that code to use functions
> instead. And if I'm not, no one is. There isn't a good transition path
> since the PRNG object needs to thread through all of the code, whether it's
> my code that I'm writing greenfield or library code that I don't control.
> That said, people writing new distributions outside of numpy.random would
> be writing functions, not trying to add to the `Distributions` class, but
> that's fine.
>
> It also allows the possibility for the `Distributions` class to be stateful
> if we want to do things like caching the next Box-Muller variate and not
> force that onto the core PRNG state like I currently do. Though I'd rather
> just drop Box-Muller, and that's not a common pattern outside of
> Box-Muller. But it's a possibility.
>
> --
> Robert Kern
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/numpy-discussion/attachments/20180127/c76673e3/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Sat, 27 Jan 2018 10:30:47 +0100
> From: Joe <solarjoe at posteo.org>
> To: numpy-discussion at python.org
> Subject: Re: [Numpy-discussion] Using np.frombuffer and cffi.buffer on
>         array of C structs (problem with struct member padding)
> Message-ID: <fed22682-60fd-0a5d-aaa5-83e67d9d1069 at posteo.org>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Thanks for your help on this! This solved my issue.
>
>
> Am 25.01.2018 um 19:01 schrieb Allan Haldane:
> > There is a new section discussing alignment in the numpy 1.14 structured
> > array docs, which has some hints about interfacing with C structs.
> >
> > These new 1.14 docs are not online yet on scipy.org, but in the meantime
> >   you can view them here:
> >
> https://ahaldane.github.io/user/basics.rec.html#automatic-byte-offsets-and-alignment
> >
> > (That links specifically to the discussion of alignments and padding).
> >
> > Allan
> >
> > On 01/25/2018 11:33 AM, Chris Barker - NOAA Federal wrote:
> >>
> >>>
> >>> The numpy dtype constructor takes an ?align? keyword that will pad it
> >>> for you.
> >>
> >>
> https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html
> >>
> >> -CHB
> >>
> >>
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> >>
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> ------------------------------
>
> End of NumPy-Discussion Digest, Vol 136, Issue 37
> *************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180129/5fd8be78/attachment-0001.html>


More information about the NumPy-Discussion mailing list