On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:

There are some users of the NumPy C code in randomkit. This was never officially supported. There has been a long open issue to provide this officially.

When I wrote randomgen I supplied .pdx files that make it simpler to write Cython code that uses the components. The lower-level API has not had much scrutiny and is in need of a clean-up. I thought this would also encourage users to extend the random machinery themselves as part of their project or code so as to minimize the requests for new (exotic) distributions to be included in Generator.

Most of the generator functions follow a pattern random_DISTRIBUTION. Some have a bit more name mangling which can easily be cleaned up (like ranomd_gauss_zig, which should become PREFIX_standard_normal).

Ralf Gommers suggested unprefixed names.

I suggested that the names should match the Python API, which I think isn't quite the same. The Python API doesn't contain things like "gamma", "t" or "f".

I tried this in a local branch and it was a bit ugly since some of the distributions have common math names (e.g., gamma) and others are very short (e.g., t or f). I think a prefix is needed, and after looking through the C API docs npy_random_ seemed like a reasonable choice (since these live in numpy.random).

Any thoughts on the following questions are welcome (others too):

1. Should there be a prefix on the C functions?
2. If so, what should the prefix be?

Before worrying about naming details, can we start with "what should be in the C/Cython API"? If I look through the current pxd files, there's a lot there that looks like it should be private, and what we expose as Python API is not all present as far as I can tell (which may be fine, if the only goal is to let people write new generators rather than use the existing ones from Cython without the Python overhead).

In the end we want to get to a doc section similar to http://scipy.github.io/devdocs/special.cython_special.html I'd think.

3. Should the legacy C functions be part of the API -- these are mostly the ones that produce or depend on polar transform normals (Box-Muller). I have a feeling no, but there may be reasons to prefer BM since they do not depend on rejection sampling.

Even if there would be a couple of users interested, it would be odd starting to do this after deeming the code "legacy". So I agree with your "no".

4. Should low-level API be consumable like any other numpy C API by including the usual header locations and library locations? Right now, the pxd simplifies writing Cython but users have sp specify the location of the headers and source manually An alternative would be to provide a function like np.get_include() -> np.random.get_include() that would specialize in random.

Good question. I'm not sure this is "like any other NumPy C API". We don't provide a C API for fft, linalg or other functionality further from core either. It's possible of course, but does it really help library authors or end users?

Cheers,

Ralf