I agree with pretty much everything you wrote Robert.  I didn't have quote the right frame but the generic class that takes a low-level core PRNG sounds like the right design, and this should make user-generated distributions easier to develop.  I was thinking along these lines inspired by the SpiPy changes that use a LowLevelCallable, e.g., 

https://ilovesymposia.com/2017/03/12/scipys-new-lowlevelcallable-is-a-game-changer/ 

This might also allow users to extend the core PRNGs using something like Numba JIT classes as an alternative.

Another area that needs though is how to correctly spawn in Multiprocess application. This might be most easily addressed by providing a guide on a good way rather than the arbitrary way used now.

Kevin





On Sat, Jan 27, 2018 at 5:03 PM <numpy-discussion-request@python.org> wrote:
Send NumPy-Discussion mailing list submissions to
        numpy-discussion@python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/numpy-discussion
or, via email, send a message with subject or body 'help' to
        numpy-discussion-request@python.org

You can reach the person managing the list at
        numpy-discussion-owner@python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of NumPy-Discussion digest..."


Today's Topics:

   1. Re: Moving NumPy's PRNG Forward (Robert Kern)
   2. Re: Using np.frombuffer and cffi.buffer on array of C structs
      (problem with struct member padding) (Joe)


----------------------------------------------------------------------

Message: 1
Date: Sat, 27 Jan 2018 09:28:54 +0900
From: Robert Kern <robert.kern@gmail.com>
To: Discussion of Numerical Python <numpy-discussion@python.org>
Subject: Re: [Numpy-discussion] Moving NumPy's PRNG Forward
Message-ID:
        <CAF6FJitFLv3U7gHkYBCFW69A5BbVe=HzBAm5oxYVcXGbBdMU=w@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard <kevin.k.sheppard@gmail.com>
wrote:
>
> I am a firm believer that the current situation is not sustainable.
There are a lot of improvements that can practically be incorporated.
While many of these are performance related, there are also improvements in
accuracy over some ranges of parameters that cannot be incorporated. I also
think that perfect stream reproducibility is a bit of a myth across
versions since this really would require identical OS, compiler and
possibly CPU for some of the generators that produce floats.
>
> I believe there is a case for separating the random generator from core
NumPy.  Some points that favor becoming a subproject:
>
> 1. It is a pure consumer of NumPy API.  Other parts of the API do no
depend on random.
> 2. A stand alone package could be installed along side many different
version of core NumPy which would reduce the pressure on freezing the
stream.

Removing numpy.random (or freezing it as deprecated legacy while all PRNG
development moves elsewhere) is probably a non-starter. It's too used for
us not to provide something. That said, we can (and ought to) make it much
easier for external packages to provide PRNG capabilities (core PRNGs and
distributions) that interoperate with the core functionality that numpy
provides. I'm also happy to place a high barrier on adding more
distributions to numpy.random once that is in place.

Specifically, core uniform PRNGs should have a small common C API that
distribution functions can use. This might just be a struct with an opaque
`void*` state pointer and then 2 function pointers for drawing a uint64
(whole range) and a double in [0,1) from the state. It's important to
expose our core uniform PRNGs as a C API because there has been a desire to
interoperate at that level, using the same PRNG state inside C or Fortran
or GPU code. If that's in place, then people can write new efficient
distribution functions in C that use this small C API agnostic to the core
PRNG algorithm. It also makes it easy to implement new core PRNGs that the
distribution functions provided by numpy.random can use.

> In terms of what is needed, I think that the underlying PRNG should be
swappable.  The will provide a simple mechanism to allow certain types of
advancement while easily providing backward compat.  In the current design
this is very hard and requires compiling many nearly identical copies of
RandomState. In pseudocode something like
>
> standard_normal(prng)
>
> where prng is a basic class that retains the PRNG state and has a small
set of core random number generators that belong to the underlying PRNG --
probably something like int32, int64, double, and possibly int53. I am not
advocating explicitly passing the PRNG as an argument, but having
generators which can take any suitable PRNG would add a lot of flexibility
in terms of taking advantage of improvements in the underlying PRNGs (see,
e.g., xoroshiro128/xorshift1024).  The "small" core PRNG would have
responsibility over state and streams.  The remainder of the module would
transform the underlying PRNG into the required distributions.

(edit: after writing the following verbiage, I realize it can be summed up
with more respect to your suggestion: yes, we should do this design, but we
don't need to and shouldn't give up on a class with distribution methods.)

Once the core PRNG C API is in place, I don't think we necessarily need to
move away from a class structure per se, though it becomes an option. We
just separate the core PRNG object from the distribution-providing class.
We don't need to make copies of the distribution-providing class just to
use a new core PRNG. I'm coming around to Nathaniel's suggestion for the
constructor API (though not the distribution-versioning, for reasons I can
get into later).  We have a couple of core uniform PRNG classes like
`MT19937` and `PCG128`. Those have a tiny API, and probably don't have a
lot of unnecessary code clones between them. Their constructors can be
different depending on the different ways they can be instantiated,
depending on the PRNG's features. I'm not sure that they'll have any common
methods besides `__getstate__/__setstate__` and probably a `copy()`. They
will expose their C API as a Python-opaque attribute. They can have
whatever algorithm-dependent methods they need (e.g. to support jumpahead).
I might not even expose to Python the uint64 and U(0,1) double sampling
methods, but maybe so.

Then we have a single `Distributions` class that provides all of the
distributions that we want to support in numpy.random (i.e. what we
currently have on `RandomState` and whatever passes our higher bar in the
future). It takes one of the core PRNG instances as an argument to the
constructor (nominally, at least; we can design factory functions to make
this more convenient).

  prng = Distributions(PCG128(seed))
  x = prng.normal(mean, std)

If someone wants to write a WELL512 core PRNG, they can just implement that
object and pass it to `Distributions()`. The `Distributions` code doesn't
need to be copied, nor do we need to much around with `__new__` tricks in
Cython.

Why do this instead of distribution functions? Well, I have a damn lot of
code that is expecting an object with at least the broad outlines of the
`RandomState` interface. I'm not going to update that code to use functions
instead. And if I'm not, no one is. There isn't a good transition path
since the PRNG object needs to thread through all of the code, whether it's
my code that I'm writing greenfield or library code that I don't control.
That said, people writing new distributions outside of numpy.random would
be writing functions, not trying to add to the `Distributions` class, but
that's fine.

It also allows the possibility for the `Distributions` class to be stateful
if we want to do things like caching the next Box-Muller variate and not
force that onto the core PRNG state like I currently do. Though I'd rather
just drop Box-Muller, and that's not a common pattern outside of
Box-Muller. But it's a possibility.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180127/c76673e3/attachment-0001.html>

------------------------------

Message: 2
Date: Sat, 27 Jan 2018 10:30:47 +0100
From: Joe <solarjoe@posteo.org>
To: numpy-discussion@python.org
Subject: Re: [Numpy-discussion] Using np.frombuffer and cffi.buffer on
        array of C structs (problem with struct member padding)
Message-ID: <fed22682-60fd-0a5d-aaa5-83e67d9d1069@posteo.org>
Content-Type: text/plain; charset=utf-8; format=flowed

Thanks for your help on this! This solved my issue.


Am 25.01.2018 um 19:01 schrieb Allan Haldane:
> There is a new section discussing alignment in the numpy 1.14 structured
> array docs, which has some hints about interfacing with C structs.
>
> These new 1.14 docs are not online yet on scipy.org, but in the meantime
>   you can view them here:
> https://ahaldane.github.io/user/basics.rec.html#automatic-byte-offsets-and-alignment
>
> (That links specifically to the discussion of alignments and padding).
>
> Allan
>
> On 01/25/2018 11:33 AM, Chris Barker - NOAA Federal wrote:
>>
>>>
>>> The numpy dtype constructor takes an ?align? keyword that will pad it
>>> for you.
>>
>> https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html
>>
>> -CHB
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


------------------------------

Subject: Digest Footer

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


------------------------------

End of NumPy-Discussion Digest, Vol 136, Issue 37
*************************************************