[Numpy-discussion] Low-level API for Random

Thu Sep 19 11:10:36 EDT 2019

Just to chime in: Numba would definitely appreciate C functions to access
the random distribution implementations, and have a side-project
(numba-scipy) that is making the Cython wrapped functions in SciPy visible
to Numba.

On Thu, Sep 19, 2019 at 5:41 AM Kevin Sheppard <kevin.k.sheppard at gmail.com>
wrote:

>
>
> On Thu, Sep 19, 2019 at 10:23 AM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard <
>> kevin.k.sheppard at gmail.com> wrote:
>>
>>> There are some users of the NumPy C code in randomkit.  This was never
>>> officially supported.  There has been a long open issue to provide this
>>> officially.
>>>
>>> When I wrote randomgen I supplied .pdx files that make it simpler to
>>> write Cython code that uses the components.  The lower-level API has not
>>> had much scrutiny and is in need of a clean-up.   I thought this would also
>>> encourage users to extend the random machinery themselves as part of their
>>> project or code so as to minimize the requests for new (exotic)
>>> distributions to be included in Generator.
>>>
>>> Most of the generator functions follow a pattern random_DISTRIBUTION.
>>> Some have a bit more name mangling which can easily be cleaned up (like
>>> ranomd_gauss_zig, which should become PREFIX_standard_normal).
>>>
>>> Ralf Gommers suggested unprefixed names.
>>>
>>
>> I suggested that the names should match the Python API, which I think
>> isn't quite the same. The Python API doesn't contain things like "gamma",
>> "t" or "f".
>>
>
> My gamma and f (I misspoke about t) I mean the names that appear as
> Generator methods:
>
>
> https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.random.Generator
>
>
> If I understand your point (and with reference with page linked below),
> then there would be something like numpy.random.cython_random.gamma (which
> is currently called numpy.random.distributions.random_gamma). Maybe I'm not
> understanding your point about the Python API though.
>
>
>>
>> I tried this in a local branch and it was a bit ugly since some of the
>>> distributions have common math names (e.g., gamma) and others are very
>>> short (e.g., t or f).  I think a prefix is needed, and after looking
>>> through the C API docs npy_random_ seemed like a reasonable choice (since
>>> these live in numpy.random).
>>>
>>> Any thoughts on the following questions are welcome (others too):
>>>
>>> 1. Should there be a prefix on the C functions?
>>> 2. If so, what should the prefix be?
>>>
>>
>> Before worrying about naming details, can we start with "what should be
>> in the C/Cython API"? If I look through the current pxd files, there's a
>> lot there that looks like it should be private, and what we expose as
>> Python API is not all present as far as I can tell (which may be fine, if
>> the only goal is to let people write new generators rather than use the
>> existing ones from Cython without the Python overhead).
>>
>
> From the ground up, for someone who want to write a new distribution:
> 1. The bit generators.  These currently have no pxd files. These are
> always going to be Python obects and so it isn't absolutely essential to
> expose them with a low-level API.  All that is needed is the capsule which
> has the bitgen struct, which is what is really needed
> 2. bitgen_t which is in common.pxd.  This is essential since it enables
> access to the callables to produce basic psueod random values.
> 3. The distributions, which are in distributions.pdx. The integer
> generators are in bounded_integers.pxd.in, which would need to be
> processed and then included after processing (same for
> bounded_integers.pxd.in).
>     a. The legacy in legacy_distributions.pxd.   If the legacy is
> included, then aug_bitgen_t needs to also be included which is also in
> legacy_distributions.pxd
> 4. The "helpers" which are defined in common.pxd.  These simplify
> implementing complete distributions which support automatix broadcasting
> when needed. They are only provided to match the signatures for the
> functions in distributions.pxd. The highest level ones are cont() and
> disc(). Some of the lower-level ones could easily be marked as private.
>
> 1,2 and 3 are pretty important.  4 could be in or out. It could help if
> someone wanted to write a fully featured distribution w/ broadcasting, but
> I think this use case is less likely than someone say wanting to implement
> a custom rejection sampler.
>
>
> For someone who wants to write a new BitGenerator
>
> 1. BitGenerator and SeedSequence in bit_generato.pxd are required. As is
> bitgen_t which is in common. bitgen_t should probably move to
> bit_generators.
> 2. aligned_malloc:  This has been requested on multiple occasions and is
> practically important when interfacing with SSE or AVX code. It is
> potentially more general than the random module. This lives in common.pxd.
>
>
>
>>
>> In the end we want to get to a doc section similar to
>> http://scipy.github.io/devdocs/special.cython_special.html I'd think.
>>
>> 3. Should the legacy C functions be part of the API -- these are mostly
>>> the ones that produce or depend on polar transform normals (Box-Muller). I
>>> have a feeling no, but there may be reasons to prefer BM since they do not
>>> depend on rejection sampling.
>>>
>>
>> Even if there would be a couple of users interested, it would be odd
>> starting to do this after deeming the code "legacy". So I agree with your
>> "no".
>>
>>
>>> 4. Should low-level API be consumable like any other numpy C API by
>>> including the usual header locations and library locations?  Right now, the
>>> pxd simplifies writing Cython but users have sp specify the location of the
>>> headers and source manually  An alternative would be to provide a function
>>> like np.get_include() -> np.random.get_include() that would specialize in
>>> random.
>>>
>>
>> Good question. I'm not sure this is "like any other NumPy C API". We
>> don't provide a C API for fft, linalg or other functionality further from
>> core either. It's possible of course, but does it really help library
>> authors or end users?
>>
>
> SciPy provides a very useful Cython API to low-level linalg. But there is
> little reason to provide C APIs to fft or linalg since they are all
> directly available. The code is random is AFAICT, one of the more complete
> C implementations of functions needed to produce variates from many
> distributions (mostly due to its ancestor randomkit, which AFAICT isn't
> maintained).
>
> An ideal API would allow projects like
> https://github.com/deepmind/torch-randomkit/tree/master/randomkit or
> numba to consume the code in NumPy without vendoring it.
>
> Best wishes,
> Kevin
>
>
>> Cheers,
>> Ralf
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190919/4238c4ae/attachment.html>