[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
shoyer at gmail.com
Tue Sep 10 13:58:39 EDT 2019
On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi <einstein.edison at gmail.com>
> On 10.09.19 05:32, Stephan Hoyer wrote:
> On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers <ralf.gommers at gmail.com>
>> I think we've chosen to try the former - dispatch on functions so we can
>> reuse the NumPy API. It could work out well, it could give some long-term
>> maintenance issues, time will tell. The question is now if and how to plug
>> the gap that __array_function__ left. It's main limitation is "doesn't work
>> for functions that don't have an array-like input" - that left out ~10-20%
>> of functions. So now we have a proposal for a structural solution to that
>> last 10-20%. It seems logical to want that gap plugged, rather than go back
>> and say "we shouldn't have gone for the first 80%, so let's go no further".
> I'm excited about solving the remaining 10-20% of use cases for flexible
> array dispatching, but the unumpy interface suggested here
> (numpy.overridable) feels like a redundant redo of __array_function__ and
> I would much rather continue to develop specialized protocols for the
> remaining usecases. Summarizing those I've seen in this thread, these
> 1. Overrides for customizing array creation and coercion.
> 2. Overrides to implement operations for new dtypes.
> 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs
> with MKL.
> (1) could mostly be solved by adding np.duckarray() and another function
> for duck array coercion. There is still the matter of overriding np.zeros
> and the like, which perhaps justifies another new protocol, but in my
> experience the use-cases for truly an array from scratch are quite rare.
> While they're rare for libraries like XArray; CuPy, Dask and PyData/Sparse
> need these.
> (2) should be tackled as part of overhauling NumPy's dtype system to
> better support user defined dtypes. But it should definitely be in the form
> of specialized protocols, e.g., which pass in preallocated arrays to into
> ufuncs for a new dtype. By design, new dtypes should not be able to
> customize the semantics of array *structure*.
> We already have a split in the type system with e.g. Cython's buffers,
> Numba's parallel type system. This is a different issue altogether, e.g.
> allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write
> of unyt to cooperate with NumPy's new dtype system.
I guess you're proposing that operations like np.sum(numpy_array,
dtype=other_dtype) could rely on other_dtype for the implementation and
potentially return a non-NumPy array? I'm not sure this is well motivated
-- it would be helpful to discuss actual use-cases.
The most commonly used NumPy functionality related to dtypes can be found
only in methods on np.ndarray, e.g., astype() and view(). But I don't think
there's any proposal to change that.
> 4. Having default implementations that allow overrides of a large part of
> the API while defining only a small part. This holds for e.g.
I'm not sure how unumpy solve the problems we encountered when trying to do
this with __array_function__ -- namely the way that it exposes all of
NumPy's internals, or requires rewriting a lot of internal NumPy code to
ensure it always casts inputs with asarray().
I think it would be useful to expose default implementations of NumPy
operations somewhere to make it easier to implement __array_function__, but
it doesn't make much sense to couple this to user facing overrides. These
can be exposed as a separate package or numpy module (e.g.,
numpy.default_implementations) that uses np.duckarray(), which library
authors can make use of by calling inside their __aray_function__ methods.
> 5. Generation of Random numbers (overriding RandomState). CuPy has its
> own implementation which would be nice to override.
I'm not sure that NumPy's random state objects make sense for duck arrays.
Because these are stateful objects, they are pretty coupled to NumPy's
implementation -- you cannot store any additional state on RandomState
objects that might be needed for a new implementation. At a bare minimum,
you will loss the reproducibility of random seeds, though this may be less
of a concern with the new random API.
> I also share Nathaniel's concern that the overrides in unumpy are too
> powerful, by allowing for control from arbitrary function arguments and
> even *non-local* control (i.e., global variables) from context managers.
> This level of flexibility can make code very hard to debug, especially in
> larger codebases.
> Backend switching needs global context, in any case. There isn't a good
> way around that other than the class dundermethods outlined in another
> thread, which would require rewrites of large amounts of code.
Do we really need to support robust backend switching in NumPy? I'm not
strongly opposed, but what use cases does it actually solve to be able to
override np.fft.fft rather than using a new function?
At some point, if you want maximum performance you won't be writing the
code using NumPy proper anyways. At best you'll be using a system with
duck-array support like CuPy.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion