[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

Stephan Hoyer shoyer at gmail.com
Tue Sep 10 13:58:39 EDT 2019


On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> On 10.09.19 05:32, Stephan Hoyer wrote:
>
> On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
>
>> I think we've chosen to try the former - dispatch on functions so we can
>> reuse the NumPy API. It could work out well, it could give some long-term
>> maintenance issues, time will tell. The question is now if and how to plug
>> the gap that __array_function__ left. It's main limitation is "doesn't work
>> for functions that don't have an array-like input" - that left out ~10-20%
>> of functions. So now we have a proposal for a structural solution to that
>> last 10-20%. It seems logical to want that gap plugged, rather than go back
>> and say "we shouldn't have gone for the first 80%, so let's go no further".
>>
>
> I'm excited about solving the remaining 10-20% of use cases for flexible
> array dispatching, but the unumpy interface suggested here
> (numpy.overridable) feels like a redundant redo of __array_function__ and
> __array_ufunc__.
>
> I would much rather continue to develop specialized protocols for the
> remaining usecases. Summarizing those I've seen in this thread, these
> include:
> 1. Overrides for customizing array creation and coercion.
> 2. Overrides to implement operations for new dtypes.
> 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs
> with MKL.
>
> (1) could mostly be solved by adding np.duckarray() and another function
> for duck array coercion. There is still the matter of overriding np.zeros
> and the like, which perhaps justifies another new protocol, but in my
> experience the use-cases for truly an array from scratch are quite rare.
>
> While they're rare for libraries like XArray; CuPy, Dask and PyData/Sparse
> need these.
>
>
> (2) should be tackled as part of overhauling NumPy's dtype system to
> better support user defined dtypes. But it should definitely be in the form
> of specialized protocols, e.g., which pass in preallocated arrays to into
> ufuncs for a new dtype. By design, new dtypes should not be able to
> customize the semantics of array *structure*.
>
> We already have a split in the type system with e.g. Cython's buffers,
> Numba's parallel type system. This is a different issue altogether, e.g.
> allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write
> of unyt to cooperate with NumPy's new dtype system.
>

I guess you're proposing that operations like np.sum(numpy_array,
dtype=other_dtype) could rely on other_dtype for the implementation and
potentially return a non-NumPy array? I'm not sure this is well motivated
-- it would be helpful to discuss actual use-cases.

The most commonly used NumPy functionality related to dtypes can be found
only in methods on np.ndarray, e.g., astype() and view(). But I don't think
there's any proposal to change that.

> 4. Having default implementations that allow overrides of a large part of
> the API while defining only a small part. This holds for e.g.
> transpose/concatenate.
>
I'm not sure how unumpy solve the problems we encountered when trying to do
this with __array_function__ -- namely the way that it exposes all of
NumPy's internals, or requires rewriting a lot of internal NumPy code to
ensure it always casts inputs with asarray().

I think it would be useful to expose default implementations of NumPy
operations somewhere to make it easier to implement __array_function__, but
it doesn't make much sense to couple this to user facing overrides. These
can be exposed as a separate package or numpy module (e.g.,
numpy.default_implementations) that uses np.duckarray(), which library
authors can make use of by calling inside their __aray_function__ methods.

> 5. Generation of Random numbers (overriding RandomState). CuPy has its
> own implementation which would be nice to override.
>
I'm not sure that NumPy's random state objects make sense for duck arrays.
Because these are stateful objects, they are pretty coupled to NumPy's
implementation -- you cannot store any additional state on RandomState
objects that might be needed for a new implementation. At a bare minimum,
you will loss the reproducibility of random seeds, though this may be less
of a concern with the new random API.

> I also share Nathaniel's concern that the overrides in unumpy are too
> powerful, by allowing for control from arbitrary function arguments and
> even *non-local* control (i.e., global variables) from context managers.
> This level of flexibility can make code very hard to debug, especially in
> larger codebases.
>
> Backend switching needs global context, in any case. There isn't a good
> way around that other than the class dundermethods outlined in another
> thread, which would require rewrites of large amounts of code.
>

Do we really need to support robust backend switching in NumPy? I'm not
strongly opposed, but what use cases does it actually solve to be able to
override np.fft.fft rather than using a new function?

At some point, if you want maximum performance you won't be writing the
code using NumPy proper anyways. At best you'll be using a system with
duck-array support like CuPy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190910/c530d189/attachment-0001.html>


More information about the NumPy-Discussion mailing list