[Numpy-discussion] Moving NumPy's PRNG Forward

Robert Kern robert.kern at gmail.com
Mon Jan 29 16:16:06 EST 2018

On Tue, Jan 30, 2018 at 5:39 AM, Pierre de Buyl <
pierre.debuyl at chem.kuleuven.be> wrote:
> Hello,
> On Sat, Jan 27, 2018 at 09:28:54AM +0900, Robert Kern wrote:
> > On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard
> > <kevin.k.sheppard at gmail.com> wrote:
> > >
> > > In terms of what is needed, I think that the underlying PRNG should
> > be swappable.  The will provide a simple mechanism to allow certain
> > types of advancement while easily providing backward compat.  In the
> > current design this is very hard and requires compiling many nearly
> > identical copies of RandomState. In pseudocode something like
> > >
> > > standard_normal(prng)
> > >
> > > where prng is a basic class that retains the PRNG state and has a
> > small set of core random number generators that belong to the
> > underlying PRNG -- probably something like int32, int64, double, and
> > possibly int53. I am not advocating explicitly passing the PRNG as an
> > argument, but having generators which can take any suitable PRNG would
> > add a lot of flexibility in terms of taking advantage of improvements
> > in the underlying PRNGs (see, e.g., xoroshiro128/xorshift1024).  The
> > "small" core PRNG would have responsibility over state and streams.
> > The remainder of the module would transform the underlying PRNG into
> > the required distributions.
> > (edit: after writing the following verbiage, I realize it can be summed
> > up with more respect to your suggestion: yes, we should do this design,
> > but we don't need to and shouldn't give up on a class with distribution
> > methods.)
> > Once the core PRNG C API is in place, I don't think we necessarily need
> > to move away from a class structure per se, though it becomes an
> > option.
> (Sorry for cutting so much, I have a short question)
> My typical use case for the C API of NumPy's random features is that I
> coding in pure Python and then switch to Cython. I have at least twice in
> past resorted to include "randomkit.h" and use that directly. My last work
> actually implements a Python/Cython interface for rngs, see
> http://threefry.readthedocs.io/using_from_cython.html
> The goal is to use exactly the same backend in Python and Cython, with a
> and a few cdefs the only changes needed for a first port to Cython.
> Is this type of feature in discussion or in project for the future of
> numpy.random?

Sort of, but not really. For sure, once we've made the decisions that let
us move forward to a new design, we'll have Cython implementations that can
be used natively from Cython as well as Python without code changes. *But*
it's not going to be an automatic speedup like your threefry library
allows. You designed that API such that each of the methods returns a
single scalar, so all you need to do is declare your functions `cpdef` and
provide a `.pxd`. Our methods return either a scalar or an array depending
on the arguments, so the methods will be declared to return `object`, and
you will pay the overhead costs for checking the arguments and such. We're
not going to change that Python API; we're only considering dropping
stream-compatibility, not source-compatibility.

I would like to make sure that we do expose a C/Cython API to the
distribution functions (i.e. that only draw a single variate and return a
scalar), but it's not likely to look exactly like the Python API. There
might be clever tricks that we can do to minimize the amount of changes
that one needs to do, though, if you are only drawing single variates at a
time (e.g. an agent-based simulation) and you want to make it go faster by
moving to Cython. For example, maybe we collect all of the single-variate
C-implemented methods into a single object sitting as an attribute on the
`Distributions` object.

cdef class DistributionsCAPI:
    cdef double normal(double loc, double scale)
    cdef double uniform(double low, double high)

cdef class Distributions:
    cdef DistributionsCAPI capi
    cpdef object normal(loc, scale, size=None):
        if size is None and np.isscalar(loc) and np.isscalar(scale):
            return self.capi.normal(loc, scale)
             # ... Make an array

prng = Distributions(...)
# From Python:
x = prng.normal(mean, std)
# From Cython:
cdef double x = prng.capi.normal(mean, std)

But we need to make some higher-level decisions first before we can get
down to this level of design. Please do jump in and remind us of this use
case once we do get down to actual work on the new API design. Thanks!

Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180130/f785c30d/attachment.html>

More information about the NumPy-Discussion mailing list