[Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods
einstein.edison at gmail.com
Fri Apr 26 06:10:13 EDT 2019
Here’s my take on it: The goal is basically “separation of interface from implementation”, NumPy reference becomes just one (reference) implementation (kind of like CPython is today). The idea is that unumpy/NumPy drive the interface, while there can be many implementations.
To make duck-arrays work with the same code. This is achieved by `__array_function__`, other than for cases where we’re creating an array.
Composability, and traversing backend boundaries.
Coercion to native library objects: This requires the “reverse dispatcher” I kept mentioning to take the args/kwargs and “put back” the coerced arrays into it. This is impossible in the current framework, but can be made possible using the proposals by Stephan and Marten.
Dispatch over arbitrary objects, such as dtypes or ufuncs, from other libraries. We are far from this goal, and, it will require repitions of protocols already available for arrays…
Here’s how `uarray` solves each of these issues:
Backends… There is no default implementation.
This is handled by (thread-safe) context managers, which make switching easy.
There’s one coercion function per type of objec
Libraries are only asked to dispatch over objects they know how to convert, so there’s no backwards-incompatible break when we add dtypes or ufuncs.
Conversion can be as simple as lambda x: x.
There’s a generic dispatcher and reverse dispatcher per function, with “marks” to indicate the type of object.
Arrays are just one “type” of object you can dispatch over, so there’s no repition by definition.
> On Friday, Apr 26, 2019 at 10:31 AM, Ralf Gommers <ralf.gommers at gmail.com (mailto:ralf.gommers at gmail.com)> wrote:
> On Fri, Apr 26, 2019 at 1:02 AM Stephan Hoyer <shoyer at gmail.com (mailto:shoyer at gmail.com)> wrote:
> > On Thu, Apr 25, 2019 at 3:39 PM Ralf Gommers <ralf.gommers at gmail.com (mailto:ralf.gommers at gmail.com)> wrote:
> > >
> > > On Fri, Apr 26, 2019 at 12:04 AM Stephan Hoyer <shoyer at gmail.com (mailto:shoyer at gmail.com)> wrote:
> > > > I do like the look of this, but keep in mind that there is a downside to exposing the implementation of NumPy functions -- now the implementation details become part of NumPy's API. I suspect we do not want to commit ourselves to never changing the implementation of NumPy functions, so at the least this will need careful disclaimers about non-guarantees of backwards compatibility.
> > >
> > > I honestly still am missing the point of claiming this. There is no change either way to what we've done for the last decade. If we change anything in the numpy implementation of any function, we use deprecation warnings etc. What am I missing here?
> > Hypothetically, wuppose we rewrite np.stack() in terms of np.block() instead of np.concatenate(), because it turns out it is faster.
> > As long as we've coercing with np.asarray(), users don't notice any material difference -- their code just gets a little faster.
> > But this could be problematic if we support duck typing. For example, I support dask arrays rely on NumPy's definition of np.stack in terms of np.concatenate, but they never bothered to implement np.block. Now upgrading NumPy breaks dask.
> Thanks, this helped clarify what's going on here. This example is clear. The problem seems to be that there's two separate discussions in this thread:
> 1. your original proposal, __numpy_implementation__. it does not have the problem of your np.concatenate example, as the "numpy implementation" is exactly the same as it is today.
> 2. splitting up the current numpy implementation into *multiple* entry points. this can be with and without coercion, with and without checking for invalid values etc.
> So far NEP 18 does (1). Your proposed __numpy_implementation__ addition to NEP 18 is still (1). Claiming that this affects the situation with respect to backwards compatibility is incorrect.
> (2) is actually a much more invasive change, and one that does much more to increase the size of the NumPy API surface. And yes, affects our backwards compatibility situation as well.
> Also note that these have very different purposes:
> (1) was to (quoting from the NEP) "allow using NumPy as a high level API for efficient multi-dimensional array operations, even with array implementations that differ greatly from numpy.ndarray."
> (2) is for making duck arrays work with numpy implementations of functions (not just with the NumPy API)
> I think (1) is mostly achieved, and I'm +1 on your NEP addition for that. (2) is quickly becoming a mess, and I agree with Nathaniel's sentiment above "I shouldn't expect __array_function__ to be useful for duck arrays?". For (2) we really need to go back and have a well thought out design. Hameer's mention of uarray could be that. Growing more __array_*__ protocols in a band-aid fashion seems unlikely to get us there.
> > This is basically the same reason why subclass support has been hard to maintain in NumPy. Apparently safe internal changes to NumPy functions can break other array types in surprising ways, even if they do not intentionally deviate from NumPy's semantics.
> Agreed. Therefore optionally skipping asarray & co is a separate discussion. That's part of the problem caused by numpy trying to be both a library and an end user interface - and often those goals conflict.
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 695 bytes
Desc: not available
More information about the NumPy-Discussion