On 23. Aug 2018, at 18:37, Stephan Hoyer <shoyer@gmail.com> wrote:

RE: the types argument

On Tue, Aug 21, 2018 at 12:21 AM Nathaniel Smith <njs@pobox.com> wrote:
This is much more of a detail as compared to the rest of the
discussion, so I don't want to quibble too much about it. (Especially
since if we keep things really-provisional, we can change our mind
about the argument later :-).) Mostly I'm just confused, because there
are lots of __dunder__ functions in Python (and NumPy), and none of
them take a special 'types' argument... so what's special about
__array_function__ that makes it necessary/worthwhile?

What's special about __array_function__ is that it's a hook that lets you override an entire API through a single interface. Unlike protocols like __add__, implementers of __array_function__ don't know exactly which arguments could have implemented the operation. 
Any implementation of, say, concatenate-via-array_function is going to
involve iterating through all the arguments and looking at each of
them to figure out what kind of object it is and how to handle it,
right? That's true whether or not they've done a "pre-check" using the
types set, so in theory it's just as easy to return NotImplemented at
that point. But I guess your point in the last paragraph is that this
means there will be lots of chances to mess up the
NotImplemented-returning code in particular, especially since it's
less likely to be tested than the happy path, which seems plausible.
So basically the point of the types set is to let people factor out
that little bit of lots of functions into one common place?

It's also a pragmatic choice: libraries like dask.array and autograd.numpy have already implemented NumPy's API without overrides. These projects follow the current numpy convention: non-native array objects are coerced into native arrays (i.e., dask or autograd arrays). They don't do any type checking.

I doubt there would be much appetite for writing alternative versions of these APIs that return NotImplemented instead -- especially while this feature remains experimental.
I guess some careful devs might be unhappy with paying extra so that other
lazier devs can get away with being lazy, but maybe it's a good
tradeoff for us (esp. since as numpy devs, we'll be getting the bug
reports regardless :-)).

The only extra amount we pay extra is the price of converting these types into a Python data structure and passing them into the __array_function__ method call. We already had to collect them for __array_function__ itself to identify unique types to call -- so this is a pretty minimal extra cost.
If that's the goal, then it does make me wonder if there might be a
more direct way to accomplish it -- like, should we let classes define
an __array_function_types__ attribute that numpy would check before
even trying to dispatch to __array_function__?

This could potentially work, but now the __array_function__ protocol itself becomes more complex and out of sync with __array_ufunc__. It's a much smaller amount of additional complexity to add an additional passed argument.

I might add that if it’s a mandatory part of the protocol, then not all things will work. For example, if XArray and Dask want to support sparse arrays, they’ll need to add an explicit dependency.

NumPy-Discussion mailing list