[Numpy-discussion] Low-level API for Random

Kevin Sheppard kevin.k.sheppard at gmail.com
Thu Sep 19 06:40:38 EDT 2019


On Thu, Sep 19, 2019 at 10:23 AM Ralf Gommers <ralf.gommers at gmail.com>
wrote:

>
>
> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard <
> kevin.k.sheppard at gmail.com> wrote:
>
>> There are some users of the NumPy C code in randomkit.  This was never
>> officially supported.  There has been a long open issue to provide this
>> officially.
>>
>> When I wrote randomgen I supplied .pdx files that make it simpler to
>> write Cython code that uses the components.  The lower-level API has not
>> had much scrutiny and is in need of a clean-up.   I thought this would also
>> encourage users to extend the random machinery themselves as part of their
>> project or code so as to minimize the requests for new (exotic)
>> distributions to be included in Generator.
>>
>> Most of the generator functions follow a pattern random_DISTRIBUTION.
>> Some have a bit more name mangling which can easily be cleaned up (like
>> ranomd_gauss_zig, which should become PREFIX_standard_normal).
>>
>> Ralf Gommers suggested unprefixed names.
>>
>
> I suggested that the names should match the Python API, which I think
> isn't quite the same. The Python API doesn't contain things like "gamma",
> "t" or "f".
>

My gamma and f (I misspoke about t) I mean the names that appear as
Generator methods:

https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.random.Generator


If I understand your point (and with reference with page linked below),
then there would be something like numpy.random.cython_random.gamma (which
is currently called numpy.random.distributions.random_gamma). Maybe I'm not
understanding your point about the Python API though.


>
> I tried this in a local branch and it was a bit ugly since some of the
>> distributions have common math names (e.g., gamma) and others are very
>> short (e.g., t or f).  I think a prefix is needed, and after looking
>> through the C API docs npy_random_ seemed like a reasonable choice (since
>> these live in numpy.random).
>>
>> Any thoughts on the following questions are welcome (others too):
>>
>> 1. Should there be a prefix on the C functions?
>> 2. If so, what should the prefix be?
>>
>
> Before worrying about naming details, can we start with "what should be in
> the C/Cython API"? If I look through the current pxd files, there's a lot
> there that looks like it should be private, and what we expose as Python
> API is not all present as far as I can tell (which may be fine, if the only
> goal is to let people write new generators rather than use the existing
> ones from Cython without the Python overhead).
>

>From the ground up, for someone who want to write a new distribution:
1. The bit generators.  These currently have no pxd files. These are always
going to be Python obects and so it isn't absolutely essential to expose
them with a low-level API.  All that is needed is the capsule which has the
bitgen struct, which is what is really needed
2. bitgen_t which is in common.pxd.  This is essential since it enables
access to the callables to produce basic psueod random values.
3. The distributions, which are in distributions.pdx. The integer
generators are in bounded_integers.pxd.in, which would need to be processed
and then included after processing (same for bounded_integers.pxd.in).
    a. The legacy in legacy_distributions.pxd.   If the legacy is included,
then aug_bitgen_t needs to also be included which is also in
legacy_distributions.pxd
4. The "helpers" which are defined in common.pxd.  These simplify
implementing complete distributions which support automatix broadcasting
when needed. They are only provided to match the signatures for the
functions in distributions.pxd. The highest level ones are cont() and
disc(). Some of the lower-level ones could easily be marked as private.

1,2 and 3 are pretty important.  4 could be in or out. It could help if
someone wanted to write a fully featured distribution w/ broadcasting, but
I think this use case is less likely than someone say wanting to implement
a custom rejection sampler.


For someone who wants to write a new BitGenerator

1. BitGenerator and SeedSequence in bit_generato.pxd are required. As is
bitgen_t which is in common. bitgen_t should probably move to
bit_generators.
2. aligned_malloc:  This has been requested on multiple occasions and is
practically important when interfacing with SSE or AVX code. It is
potentially more general than the random module. This lives in common.pxd.



>
> In the end we want to get to a doc section similar to
> http://scipy.github.io/devdocs/special.cython_special.html I'd think.
>
> 3. Should the legacy C functions be part of the API -- these are mostly
>> the ones that produce or depend on polar transform normals (Box-Muller). I
>> have a feeling no, but there may be reasons to prefer BM since they do not
>> depend on rejection sampling.
>>
>
> Even if there would be a couple of users interested, it would be odd
> starting to do this after deeming the code "legacy". So I agree with your
> "no".
>
>
>> 4. Should low-level API be consumable like any other numpy C API by
>> including the usual header locations and library locations?  Right now, the
>> pxd simplifies writing Cython but users have sp specify the location of the
>> headers and source manually  An alternative would be to provide a function
>> like np.get_include() -> np.random.get_include() that would specialize in
>> random.
>>
>
> Good question. I'm not sure this is "like any other NumPy C API". We don't
> provide a C API for fft, linalg or other functionality further from core
> either. It's possible of course, but does it really help library authors or
> end users?
>

SciPy provides a very useful Cython API to low-level linalg. But there is
little reason to provide C APIs to fft or linalg since they are all
directly available. The code is random is AFAICT, one of the more complete
C implementations of functions needed to produce variates from many
distributions (mostly due to its ancestor randomkit, which AFAICT isn't
maintained).

An ideal API would allow projects like
https://github.com/deepmind/torch-randomkit/tree/master/randomkit or numba
to consume the code in NumPy without vendoring it.

Best wishes,
Kevin


> Cheers,
> Ralf
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190919/c26ace23/attachment-0001.html>


More information about the NumPy-Discussion mailing list