[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
Stephan Hoyer
shoyer at gmail.com
Wed Sep 11 22:03:20 EDT 2019
On Wed, Sep 11, 2019 at 4:18 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
> On Tue, Sep 10, 2019 at 10:59 AM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi <einstein.edison at gmail.com>
>> wrote:
>>
>>> On 10.09.19 05:32, Stephan Hoyer wrote:
>>>
>>> On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers <ralf.gommers at gmail.com>
>>> wrote:
>>>
>>>> I think we've chosen to try the former - dispatch on functions so we
>>>> can reuse the NumPy API. It could work out well, it could give some
>>>> long-term maintenance issues, time will tell. The question is now if and
>>>> how to plug the gap that __array_function__ left. It's main limitation is
>>>> "doesn't work for functions that don't have an array-like input" - that
>>>> left out ~10-20% of functions. So now we have a proposal for a structural
>>>> solution to that last 10-20%. It seems logical to want that gap plugged,
>>>> rather than go back and say "we shouldn't have gone for the first 80%, so
>>>> let's go no further".
>>>>
>>>
>>> I'm excited about solving the remaining 10-20% of use cases for flexible
>>> array dispatching,
>>>
>>> Great! I think most (but not all) of us are on the same page here.
> Actually now that Peter came up with the `like=` keyword idea for array
> creation functions I'm very interested in seeing that worked out, feels
> like that could be a nice solution for part of that 10-20% that did look
> pretty bad before.
>
>> but the unumpy interface suggested here (numpy.overridable) feels like a
>>> redundant redo of __array_function__ and __array_ufunc__.
>>>
>>>
> A bit of context: a big part of the reason I advocated for
> numpy.overridable is that library authors can use it *only* for the parts
> not already covered by the protocols we already have. If there's overlap
> there's several ways to deal with that, including only including part of
> the unumpy API surface. It does plug all the holes in one go (although you
> can then indeed argue it does too much), and there is no other coherent
> proposal/vision yet that does this. What you wrote below comes closest, and
> I'd love to see that worked out (e.g. the like= argument for array
> creation). What I don't like is an ad-hoc plugging of one hole at a time
> without visibility on how many more protocols and new workaround functions
> in the API we would need. So hopefully we can come to an apples-to-apples
> comparison of two design alternatives.
>
> Also, we just discussed this whole thread in the community call, and it's
> clear that it's a complex matter with many different angles. It's very hard
> to get a full overview. Our conclusion in the call was that this will
> benefit from an in-person discussion. The sprint in November may be a
> really good opportunity for that.
>
Sounds good, I'm looking forward to the discussion at the November sprint!
> In the meantime we can of course keep working out ideas/docs. For now I
> think it's clear that we (the NEP authors) have some homework to do - that
> may take some time.
>
>
>>> I would much rather continue to develop specialized protocols for the
>>> remaining usecases. Summarizing those I've seen in this thread, these
>>> include:
>>> 1. Overrides for customizing array creation and coercion.
>>> 2. Overrides to implement operations for new dtypes.
>>> 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs
>>> with MKL.
>>>
>>> (1) could mostly be solved by adding np.duckarray() and another function
>>> for duck array coercion. There is still the matter of overriding np.zeros
>>> and the like, which perhaps justifies another new protocol, but in my
>>> experience the use-cases for truly an array from scratch are quite rare.
>>>
>>> While they're rare for libraries like XArray; CuPy, Dask and
>>> PyData/Sparse need these.
>>>
>>>
>>> (2) should be tackled as part of overhauling NumPy's dtype system to
>>> better support user defined dtypes. But it should definitely be in the form
>>> of specialized protocols, e.g., which pass in preallocated arrays to into
>>> ufuncs for a new dtype. By design, new dtypes should not be able to
>>> customize the semantics of array *structure*.
>>>
>>> We already have a split in the type system with e.g. Cython's buffers,
>>> Numba's parallel type system. This is a different issue altogether, e.g.
>>> allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write
>>> of unyt to cooperate with NumPy's new dtype system.
>>>
>>
>> I guess you're proposing that operations like np.sum(numpy_array,
>> dtype=other_dtype) could rely on other_dtype for the implementation and
>> potentially return a non-NumPy array? I'm not sure this is well motivated
>> -- it would be helpful to discuss actual use-cases.
>>
>> The most commonly used NumPy functionality related to dtypes can be found
>> only in methods on np.ndarray, e.g., astype() and view(). But I don't think
>> there's any proposal to change that.
>>
>>> 4. Having default implementations that allow overrides of a large part
>>> of the API while defining only a small part. This holds for e.g.
>>> transpose/concatenate.
>>>
>> I'm not sure how unumpy solve the problems we encountered when trying to
>> do this with __array_function__ -- namely the way that it exposes all of
>> NumPy's internals, or requires rewriting a lot of internal NumPy code to
>> ensure it always casts inputs with asarray().
>>
>> I think it would be useful to expose default implementations of NumPy
>> operations somewhere to make it easier to implement __array_function__, but
>> it doesn't make much sense to couple this to user facing overrides. These
>> can be exposed as a separate package or numpy module (e.g.,
>> numpy.default_implementations) that uses np.duckarray(), which library
>> authors can make use of by calling inside their __aray_function__ methods.
>>
>>> 5. Generation of Random numbers (overriding RandomState). CuPy has its
>>> own implementation which would be nice to override.
>>>
>> I'm not sure that NumPy's random state objects make sense for duck
>> arrays. Because these are stateful objects, they are pretty coupled to
>> NumPy's implementation -- you cannot store any additional state on
>> RandomState objects that might be needed for a new implementation. At a
>> bare minimum, you will loss the reproducibility of random seeds, though
>> this may be less of a concern with the new random API.
>>
>>> I also share Nathaniel's concern that the overrides in unumpy are too
>>> powerful, by allowing for control from arbitrary function arguments and
>>> even *non-local* control (i.e., global variables) from context managers.
>>> This level of flexibility can make code very hard to debug, especially in
>>> larger codebases.
>>>
>>> Backend switching needs global context, in any case. There isn't a good
>>> way around that other than the class dundermethods outlined in another
>>> thread, which would require rewrites of large amounts of code.
>>>
>>
>> Do we really need to support robust backend switching in NumPy? I'm not
>> strongly opposed, but what use cases does it actually solve to be able to
>> override np.fft.fft rather than using a new function?
>>
>
> I don't know, but that feels like an odd question. We wanted an FFT
> backend system. Now applying __array_function__ to numpy.fft happened
> without a real discussion, but as a backend system I don't think it would
> have met the criteria. Something that works for CuPy, Dask and Xarray, but
> not for Pyfftw or mkl_fft is only half a solution.
>
I agree, __array_function__ is not a backend system.
> Cheers,
> Ralf
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190911/83c866f8/attachment.html>
More information about the NumPy-Discussion
mailing list