[Numpy-discussion] Low-level API for Random

Fri Sep 20 23:32:04 EDT 2019

On Fri, Sep 20, 2019 at 7:09 AM Robert Kern <robert.kern at gmail.com> wrote:

>
>
> On Fri, Sep 20, 2019 at 6:09 AM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>>
>>
>> On Fri, Sep 20, 2019 at 5:29 AM Robert Kern <robert.kern at gmail.com>
>> wrote:
>>
>>>
>>> We might end up with more than 2 implementations if we need to change
>>> something about the function signature, for whatever reason, and we want to
>>> retain C/Cython API compatibility with older code. The C functions aren't
>>> necessarily going to be one-to-one to the Generator methods. They're just
>>> part of the implementation. So for example, if we wanted to, say,
>>> precompute some intermediate values from the given scalar parameters so we
>>> don't have to recompute them for each element of the `size`-large requested
>>> output, we might do that in one C function and pass those intermediate
>>> values as arguments to the C function that does the actual sampling. So
>>> we'd have two C functions for that one Generator method, and the sampling C
>>> function will not have the same signature as it did before the modification
>>> that refactored the work into two functions. In that case, I would not be
>>> so strict as to require that `Generator.foo` is one to one with
>>> `random_foo`.
>>>
>>
>> You're saying "be so strict" as if it were a bad thing, or a major effort.
>>
>
> I am. It's an unnecessary limitation on the C API without a corresponding
> benefit. Your original complaint
>

It's not a "complaint". We're having this discussion because we shipped a
partial API in 1.17.0 that we will now have to go back and either take out
or clean up in 1.17.3. The PR for the new numpy.random grew so large that
we didn't notice or discuss that (such things happen, no big deal - we have
limited reviewer bandwidth). So now that we do, it makes sense to actually
think about what needs to be in the API. For now I think that's only the
parts that are matching the Python API plus what is needed to use them from
C/Cython. Future additions require similar review and criteria as adding to
the Python API and the existing NumPy C API. To me, your example seems to
(a) not deal with API stability, and (b) expose too much implementation
detail.

To be clear about the actual status, we:
- shipped one header file (bitgen.h)
- shipped two pxd files (common.pxd, bit_generator.pxd)
- removed a header file we used to ship (randomkit.h)
- did not ship distributions.pxd, bounded_integers.pxd,
legacy_distributions.pxd or related header files

bit_generator.pxd looks fine, common.pxd contains parts that shouldn't be
there. I think the intent was to ship at least distributions.pxd/h, and
perhaps all of those pxd files.

is much more directly addressed by a "don't gratuitously name related C
> functions differently than the Python methods they implement" rule (e.g.
> "gauss" instead of "normal").
>
>
>> I understand that in some cases a C API can not be evolved in the same
>> way as a Python API, but in the example you're giving here I'd say you want
>> one function to be public, and one private. Making both public just exposes
>> more implementation details for no good reason, and will give us more
>> maintenance issues long-term.
>>
>
> Not at all. In this example, neither one of those functions is useful
> without the other. If one is public, both must be.
>

If neither one is useful without the other, it sounds like both should be
private and the third one that puts them together - the one that didn't
change signature and implements `Generator.foo` - is the public one.

Cheers,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190920/f03018b6/attachment.html>