[Numpy-discussion] Low-level API for Random

Fri Sep 20 07:18:38 EDT 2019

I have used C-api in the past, and would like to see a convenient and
stable way to do this.  Currently I'm using randomgen, but calling
(from c++)
to the python api.  The inefficiency is amortized by generating and
caching batches of results.

I thought randomgen was supposed to be the future of numpy random, so
I've based on that.

On Fri, Sep 20, 2019 at 6:08 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
>
> On Fri, Sep 20, 2019 at 5:29 AM Robert Kern <robert.kern at gmail.com> wrote:
>>
>> On Thu, Sep 19, 2019 at 11:04 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>>>
>>>
>>>
>>> On Thu, Sep 19, 2019 at 4:53 PM Robert Kern <robert.kern at gmail.com> wrote:
>>>>
>>>> On Thu, Sep 19, 2019 at 5:24 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>>>>>
>>>>>
>>>>> On Thu, Sep 19, 2019 at 10:28 AM Kevin Sheppard <kevin.k.sheppard at gmail.com> wrote:
>>>>>>
>>>>>> There are some users of the NumPy C code in randomkit.  This was never officially supported.  There has been a long open issue to provide this officially.
>>>>>>
>>>>>> When I wrote randomgen I supplied .pdx files that make it simpler to write Cython code that uses the components.  The lower-level API has not had much scrutiny and is in need of a clean-up.   I thought this would also encourage users to extend the random machinery themselves as part of their project or code so as to minimize the requests for new (exotic) distributions to be included in Generator.
>>>>>>
>>>>>> Most of the generator functions follow a pattern random_DISTRIBUTION.  Some have a bit more name mangling which can easily be cleaned up (like ranomd_gauss_zig, which should become PREFIX_standard_normal).
>>>>>>
>>>>>> Ralf Gommers suggested unprefixed names.
>>>>>
>>>>>
>>>>> I suggested that the names should match the Python API, which I think isn't quite the same. The Python API doesn't contain things like "gamma", "t" or "f".
>>>>
>>>>
>>>> As the implementations evolve, they aren't going to match one-to-one 100%. The implementations are shared by the legacy RandomState. When we update an algorithm, we'll need to make a new function with the better algorithm for Generator to use, then we'll have two C functions roughly corresponding to the same method name (albeit on different classes). C doesn't give us as many namespace options as Python. We could rely on conventional prefixes to distinguish between the two classes of function (e.g. legacy_normal vs random_normal).
>>>
>>>
>>> That seems simple and clear
>>>
>>>> There are times when it would be nice to be more descriptive about the algorithm difference (e.g. random_normal_polar vs random_normal_ziggurat),
>>>
>>>
>>> We decided against versioning algorithms in NEP 19, so an update to an algorithm would mean we'd want to get rid of the older version (unless it's still in use by legacy). So AFAICT we'd never have both random_normal_polar and random_normal_ziggurat present at the same time?
>>
>>
>> Well, we must because one's used by the legacy RandomState and one's used by Generator. :-)
>>
>>>
>>> I may be missing your point here, but if we have in Python `Generator.normal` and can switch its implementation from polar to ziggurat or vice versa without any deprecation, then why would we want to switch names in the C API?
>>
>>
>> I didn't mean to suggest that we'd have an unbounded number of functions as we improve the algorithms, just that we might have 2 once we decide to change something about the algorithm. We need 2 to support both the improved algorithm in Generator and the legacy algorithm in RandomState. The current implementation of the C function would be copied to a new name (`legacy_foo` or whatever), then we'd make RandomState use that frozen copy, then we make the desired modifications to the main function that Generator is referencing (`random_foo`).
>>
>> Or we could just make those legacy copies now so that people get to use them explicitly under the legacy names, whatever they are, and we can feel more free to modify the main implementations. I suggested this earlier, but convinced myself that it wasn't strictly necessary. But then I admit I was more focused on the Python API stability than any promises about the C/Cython API.
>>
>> We might end up with more than 2 implementations if we need to change something about the function signature, for whatever reason, and we want to retain C/Cython API compatibility with older code. The C functions aren't necessarily going to be one-to-one to the Generator methods. They're just part of the implementation. So for example, if we wanted to, say, precompute some intermediate values from the given scalar parameters so we don't have to recompute them for each element of the `size`-large requested output, we might do that in one C function and pass those intermediate values as arguments to the C function that does the actual sampling. So we'd have two C functions for that one Generator method, and the sampling C function will not have the same signature as it did before the modification that refactored the work into two functions. In that case, I would not be so strict as to require that `Generator.foo` is one to one with `random_foo`.
>
>
> You're saying "be so strict" as if it were a bad thing, or a major effort. I understand that in some cases a C API can not be evolved in the same way as a Python API, but in the example you're giving here I'd say you want one function to be public, and one private. Making both public just exposes more implementation details for no good reason, and will give us more maintenance issues long-term.
>
> Anyway, this is not an issue today. If we try to keep Python and C APIs matching, we can deal with possible difficulties with that if and when they arise - should be infrequent.
>
> Cheers,
> Ralf
>
>>
>> To your point, though, we don't have to use gratuitously different names when there _is_ a one-to-one relationship. `random_gauss_zig` should be `random_normal`.
>>
>> --
>> Robert Kern
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-- 
Those who don't understand recursion are doomed to repeat it