Mailman 3 Should unique types of all arguments be passed on in __array_function__? - NumPy-Discussion

newer
Developer Meeting, Berkeley, 30...

Should unique types of all arguments be passed on in __array_function__?

Marten van Kerkwijk

4 Nov 2018 4 Nov '18

9:59 a.m.

Hi All, While thinking about implementations using __array_function__, I wondered whether the "types" argument passed on is not defined too narrowly. Currently, it only contains the types of arguments that provide __array_ufunc__, but wouldn't it make more sense to provide the unique types of all arguments, independently of whether those types have defined __array_ufunc__? It would seem quite useful for any override to know, e.g., whether a string or an integer is passed on. I thought of this partially as I was wondering how an implementation for ndarray itself would look like. For that, it is definitely useful to know all unique types, since if it is only ndarray, no casting whatsoever needs to be done, while if there are integers, lists, etc, an attempt has to be made to turn these into arrays (i.e., the `as[any]array` calls currently present in the implementations, which really more logically are part of `ndarray.__array_function__` dispatch). Should we change this? It is quite trivially done, but perhaps I am missing a reason for omitting the non-override types. All the best, Marten

Attachments:

attachment.htm (text/html — 1.3 KB)

Show replies by date

Stephan Hoyer

4 Nov 4 Nov

6:51 p.m.

On Sun, Nov 4, 2018 at 8:03 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

I thought of this partially as I was wondering how an implementation for ndarray itself would look like. For that, it is definitely useful to know all unique types, since if it is only ndarray, no casting whatsoever needs to be done, while if there are integers, lists, etc, an attempt has to be made to turn these into arrays

OK, so hypothetically we could invoke versions of each the numpy function that doesn't call `as[any]array`, and this would slightly speed-up subclasses that call super().__array_function__? The former feels pretty unlikely for now -- and would be speeding up a somewhat niche use-case (more niche even than __array_function__ in general) -- but perhaps I could be convinced.

...

(i.e., the `as[any]array` calls currently present in the implementations, which really more logically are part of `ndarray.__array_function__` dispatch).

I can sort of see the reasoning for this, but I suspect the overhead of actually calling `ndarray.__array_function__` as part of calling every NumPy functions would be prohibitive. It would mean that __array_function__ attributes get checked twice, once for dispatching and once in `ndarray.__array_function__`. It would also mean that `ndarray.__array_function__` would need to grow a general purpose coercion mechanism for converting array-like arguments into ndarray objects. I suspect this isn't really possible given the diversity of function signatures in NumPy, e.g., consider the handling of lists in np.block() (recurse) vs. np.concatenate (pass through) vs ufuncs (coerce to ndarray). The best we could do would be add another special function like dispatchers for handling coercion for each specific NumPy functions. Should we change this? It is quite trivially done, but perhaps I am missing

...

a reason for omitting the non-override types.

Realistically, without these other changes in NumPy, how would this improve code using __array_function__? From a general purpose dispatching perspective, are there cases where you'd want to return NotImplemented based on types that don't implement __array_function__? I guess this might help if your alternative array class is super-explicit, and doesn't automatically call `asmyarray()` on each argument. You could rely on __array_function__ to return NotImplement (and thus raise TypeError) rather than type checking in every function you write for your alternative arrays. One minor downside would speed: now __array_function__ implementations need to check a longer list of types. Another minor downside: if users follow the example of NDArrayOperatorsMixin docstring, they would now need to explicitly list all of the scalar types (without __array_function__) that they support, including builtin types like int and type(None). I suppose this ties into our recommended best practices for doing type checking in __array_ufunc__/__array_function__ implementations, which should probably be updated regardless: https://github.com/numpy/numpy/issues/12258#issuecomment-432858949 Best, Stephan

Marten van Kerkwijk

5 Nov 5 Nov

8 a.m.

Hi Stephan, I fear my example about thinking about `ndarray.__array_function__` distracted from the gist of my question, which was whether for `__array_function__` implementations *generally* it wouldn't be handier to have all unique types rather than just those that override `__array_function__`. It would seem that for any other implementation than for numpy itself, the presence of __array_function__ is indeed almost irrelevant. As a somewhat random example, why would it, e.g., for DASK be useful to know that another argument is a Quantity, but not that it is a file handle? (Presumably, it cannot handle either...) All the best, Marten

Stephan Hoyer

9 Nov 9 Nov

5:28 p.m.

On Mon, Nov 5, 2018 at 9:00 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...

Hi Stephan,

I fear my example about thinking about `ndarray.__array_function__` distracted from the gist of my question, which was whether for `__array_function__` implementations *generally* it wouldn't be handier to have all unique types rather than just those that override `__array_function__`. It would seem that for any other implementation than for numpy itself, the presence of __array_function__ is indeed almost irrelevant. As a somewhat random example, why would it, e.g., for DASK be useful to know that another argument is a Quantity, but not that it is a file handle? (Presumably, it cannot handle either...)

In practice, it is of course easy to simply ignore arguments that don’t define __array_function__. But I do think the distinction is important for more than merely ndarray: the value of the types argument tells you the set of types that might have a conflicting implementation. For example, Dask might be happy to handle any non-arrays as scalars (like NumPy), e.g., it should be fine to make a dask array consisting of a decimal object. Since decimal doesn’t define __array_function__, there’s no need to do anything special to handle it inside dask.array.Array.__array_function__. If decimal appeared in types, then dask would have to be careful to let arbitrary types that don’t define __array_function__ pass through. In contrast, dask definitely wants to know if another type defines __array_function__, because they might have a conflicting implementation. This is the main reason why we have the types argument in the first place — to make these checks easy. In my experience, it is super common for Python arithmetic methods to be implemented improperly, i.e., never returning NotImplemented. This will hopefully be less common with __array_function__. More broadly, it is only necessary to reject an argument type at the __array_function__ level if it defines __array_function__ itself, because that’s the only case where it would make a difference to return NotImplemented rather than trying (and failing) to call the overriden function implementation.

...

All the best,

Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

10 Nov 10 Nov

11:59 a.m.

...

More broadly, it is only necessary to reject an argument type at the __array_function__ level if it defines __array_function__ itself, because that’s the only case where it would make a difference to return NotImplemented rather than trying (and failing) to call the overriden function implementation.

Yes, this makes sense -- these are the only types that could possibly change the outcome if the class now called fails to produce a result. Indeed, that reasoning makes it logical that `ndarray` itself is not present even though it defines `__array_ufunc__` - we know it cannot handle anything with a `__array_ufunc__` implementation. Hameer, is Stephan's argument convincing to you too? If so, I'll close the PR. -- Marten

Hameer Abbasi

4:07 p.m.

...

On Saturday, Nov 10, 2018 at 6:59 PM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com (mailto:m.h.vankerkwijk@gmail.com)> wrote:

...
More broadly, it is only necessary to reject an argument type at the __array_function__ level if it defines __array_function__ itself, because that’s the only case where it would make a difference to return NotImplemented rather than trying (and failing) to call the overriden function implementation.

Yes, this makes sense -- these are the only types that could possibly change the outcome if the class now called fails to produce a result. Indeed, that reasoning makes it logical that `ndarray` itself is not present even though it defines `__array_ufunc__` - we know it cannot handle anything with a `__array_ufunc__` implementation.

Hameer, is Stephan's argument convincing to you too? If so, I'll close the PR. I agree with Stephan here, other than the fact that ndarray should be in the list of types. I can think of many cases in PyData/Sparse where I dont want to allow mixed inputs, but maybe that’s a tangential discussion.

Best Regards, Hameer Abbasi

...

-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Stephan Hoyer

4:51 p.m.

On Sat, Nov 10, 2018 at 2:08 PM Hameer Abbasi <einstein.edison@gmail.com> wrote:

...

I agree with Stephan here, other than the fact that ndarray should be in the list of types. I can think of many cases in PyData/Sparse where I dont want to allow mixed inputs, but maybe that’s a tangential discussion.

To be clear: ndarray *is* currently preserved in the list of types passed to __array_function__ (because ndarray.__array_function__ is defined).

Hameer Abbasi

5:16 p.m.

In that case, ignore my comment. :) Best Regards, Hameer Abbasi

...

On Saturday, Nov 10, 2018 at 11:52 PM, Stephan Hoyer <shoyer@gmail.com (mailto:shoyer@gmail.com)> wrote: On Sat, Nov 10, 2018 at 2:08 PM Hameer Abbasi <einstein.edison@gmail.com (mailto:einstein.edison@gmail.com)> wrote:

...
I agree with Stephan here, other than the fact that ndarray should be in the list of types. I can think of many cases in PyData/Sparse where I dont want to allow mixed inputs, but maybe that’s a tangential discussion.

To be clear: ndarray *is* currently preserved in the list of types passed to __array_function__ (because ndarray.__array_function__ is defined). _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Marten van Kerkwijk

5 Nov 5 Nov

8:08 a.m.

More specifically: Should we change this? It is quite trivially done, but perhaps I am missing

...

...
a reason for omitting the non-override types.

Realistically, without these other changes in NumPy, how would this improve code using __array_function__? From a general purpose dispatching perspective, are there cases where you'd want to return NotImplemented based on types that don't implement __array_function__?

I think, yes, that would be the closest analogy to the python operators. Saves you from having separate cases for types that have and do not have `__array_function__`.

...

I guess this might help if your alternative array class is super-explicit, and doesn't automatically call `asmyarray()` on each argument. You could rely on __array_function__ to return NotImplement (and thus raise TypeError) rather than type checking in every function you write for your alternative arrays.

Indeed.

...

One minor downside would speed: now __array_function__ implementations need to check a longer list of types.

That's true.

...

Another minor downside: if users follow the example of NDArrayOperatorsMixin docstring, they would now need to explicitly list all of the scalar types (without __array_function__) that they support, including builtin types like int and type(None). I suppose this ties into our recommended best practices for doing type checking in __array_ufunc__/__array_function__ implementations, which should probably be updated regardless: https://github.com/numpy/numpy/issues/12258#issuecomment-432858949

Also true. It makes me wonder again whether passing on the types is useful at all... But I end up thinking that it is not up to an implementation to raise TypeError - it should just return NotImplemented. If we'd wanted to give more information, we might also consider passing on `overloaded_args` - then perhaps one has the best of both worlds. All the best, Marten

Marten van Kerkwijk

8:28 a.m.

Hi Stephan, Another part of your reply worth considering, though slightly off topic for the question here, of what to pass on in `types`: On Sun, Nov 4, 2018 at 7:51 PM Stephan Hoyer <shoyer@gmail.com> wrote:

...

On Sun, Nov 4, 2018 at 8:03 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

...
I thought of this partially as I was wondering how an implementation for ndarray itself would look like. For that, it is definitely useful to know all unique types, since if it is only ndarray, no casting whatsoever needs to be done, while if there are integers, lists, etc, an attempt has to be made to turn these into arrays

OK, so hypothetically we could invoke versions of each the numpy function that doesn't call `as[any]array`, and this would slightly speed-up subclasses that call super().__array_function__?

A longer-term goal that I had in mind here was generally for the implementations to just be able to assume their arguments are ndarray, i.e., be free to assume there is a shape, dtype, etc. That is not specifically useful for subclasses; for pure python code, it might also mean array mimics could happily use the implementation. But perhaps more importantly, the code would become substantially cleaner.

Anyway, really a longer-term goal... All the best, Marten

2090

Age (days ago)

2096

Last active (days ago)

List overview

Download

9 comments

3 participants

participants (3)

Hameer Abbasi
Marten van Kerkwijk
Stephan Hoyer

Should unique types of all arguments be passed on in __array_function__?

Marten van Kerkwijk

Stephan Hoyer

Marten van Kerkwijk

Stephan Hoyer

Marten van Kerkwijk

Hameer Abbasi

Stephan Hoyer

Hameer Abbasi

Marten van Kerkwijk

Marten van Kerkwijk

tags

participants (3)