
Just to chime in: Numba would definitely appreciate C functions to access the random distribution implementations, and have a side-project (numba-scipy) that is making the Cython wrapped functions in SciPy visible to Numba. On Thu, Sep 19, 2019 at 5:41 AM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
On Thu, Sep 19, 2019 at 10:23 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard < kevin.k.sheppard@gmail.com> wrote:
There are some users of the NumPy C code in randomkit. This was never officially supported. There has been a long open issue to provide this officially.
When I wrote randomgen I supplied .pdx files that make it simpler to write Cython code that uses the components. The lower-level API has not had much scrutiny and is in need of a clean-up. I thought this would also encourage users to extend the random machinery themselves as part of their project or code so as to minimize the requests for new (exotic) distributions to be included in Generator.
Most of the generator functions follow a pattern random_DISTRIBUTION. Some have a bit more name mangling which can easily be cleaned up (like ranomd_gauss_zig, which should become PREFIX_standard_normal).
Ralf Gommers suggested unprefixed names.
I suggested that the names should match the Python API, which I think isn't quite the same. The Python API doesn't contain things like "gamma", "t" or "f".
My gamma and f (I misspoke about t) I mean the names that appear as Generator methods:
https://docs.scipy.org/doc/numpy/reference/random/generator.html#numpy.rando...
If I understand your point (and with reference with page linked below), then there would be something like numpy.random.cython_random.gamma (which is currently called numpy.random.distributions.random_gamma). Maybe I'm not understanding your point about the Python API though.
I tried this in a local branch and it was a bit ugly since some of the
distributions have common math names (e.g., gamma) and others are very short (e.g., t or f). I think a prefix is needed, and after looking through the C API docs npy_random_ seemed like a reasonable choice (since these live in numpy.random).
Any thoughts on the following questions are welcome (others too):
1. Should there be a prefix on the C functions? 2. If so, what should the prefix be?
Before worrying about naming details, can we start with "what should be in the C/Cython API"? If I look through the current pxd files, there's a lot there that looks like it should be private, and what we expose as Python API is not all present as far as I can tell (which may be fine, if the only goal is to let people write new generators rather than use the existing ones from Cython without the Python overhead).
From the ground up, for someone who want to write a new distribution: 1. The bit generators. These currently have no pxd files. These are always going to be Python obects and so it isn't absolutely essential to expose them with a low-level API. All that is needed is the capsule which has the bitgen struct, which is what is really needed 2. bitgen_t which is in common.pxd. This is essential since it enables access to the callables to produce basic psueod random values. 3. The distributions, which are in distributions.pdx. The integer generators are in bounded_integers.pxd.in, which would need to be processed and then included after processing (same for bounded_integers.pxd.in). a. The legacy in legacy_distributions.pxd. If the legacy is included, then aug_bitgen_t needs to also be included which is also in legacy_distributions.pxd 4. The "helpers" which are defined in common.pxd. These simplify implementing complete distributions which support automatix broadcasting when needed. They are only provided to match the signatures for the functions in distributions.pxd. The highest level ones are cont() and disc(). Some of the lower-level ones could easily be marked as private.
1,2 and 3 are pretty important. 4 could be in or out. It could help if someone wanted to write a fully featured distribution w/ broadcasting, but I think this use case is less likely than someone say wanting to implement a custom rejection sampler.
For someone who wants to write a new BitGenerator
1. BitGenerator and SeedSequence in bit_generato.pxd are required. As is bitgen_t which is in common. bitgen_t should probably move to bit_generators. 2. aligned_malloc: This has been requested on multiple occasions and is practically important when interfacing with SSE or AVX code. It is potentially more general than the random module. This lives in common.pxd.
In the end we want to get to a doc section similar to http://scipy.github.io/devdocs/special.cython_special.html I'd think.
3. Should the legacy C functions be part of the API -- these are mostly
the ones that produce or depend on polar transform normals (Box-Muller). I have a feeling no, but there may be reasons to prefer BM since they do not depend on rejection sampling.
Even if there would be a couple of users interested, it would be odd starting to do this after deeming the code "legacy". So I agree with your "no".
4. Should low-level API be consumable like any other numpy C API by including the usual header locations and library locations? Right now, the pxd simplifies writing Cython but users have sp specify the location of the headers and source manually An alternative would be to provide a function like np.get_include() -> np.random.get_include() that would specialize in random.
Good question. I'm not sure this is "like any other NumPy C API". We don't provide a C API for fft, linalg or other functionality further from core either. It's possible of course, but does it really help library authors or end users?
SciPy provides a very useful Cython API to low-level linalg. But there is little reason to provide C APIs to fft or linalg since they are all directly available. The code is random is AFAICT, one of the more complete C implementations of functions needed to produce variates from many distributions (mostly due to its ancestor randomkit, which AFAICT isn't maintained).
An ideal API would allow projects like https://github.com/deepmind/torch-randomkit/tree/master/randomkit or numba to consume the code in NumPy without vendoring it.
Best wishes, Kevin
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion