[Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API

Sun Jun 3 19:00:08 EDT 2018

On Sun, Jun 3, 2018 at 8:19 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> My more general comment is one of speed: for *normal* operation
> performance should be impacted as minimally as possible. I think this is a
> serious issue and feel strongly it *has* to be possible to avoid all
> arguments being checked for the `__array_function__` attribute, i.e., there
> should be an obvious way to ensure no type checking dance is done.
>

I agree that all we should try minimize the impact of dispatching on normal
operations. It would be helpful to identify examples of real workflows, so
we can measure the impact of doing these checks empirically. That said, I
think a small degradation in performance for code that works with small
arrays should be acceptable, because performance is an already an accepted
limitations of using NumPy/Python for these use cases.

In most cases, I suspect that the overhead of a function call and checking
several arguments for "__array_function__" will be negligible, like the
situation for __array_ufunc__. I'm not strongly opposed to either of your
proposed solutions, but I do think it would be a little strange to insist
that we need a solution for __array_function__ when __array_ufunc__ was
fine.

> A. Two "namespaces", one for the undecorated base functions, and one
> completely trivial one for the decorated ones. The idea would be that if
> one knows one is dealing with arrays only, one would do `import
> numpy.array_only as np` (i.e., the reverse of the suggestion currently in
> the NEP, where the decorated ones are in their own namespace - I agree with
> the reasons for discounting that one).
>

I will mention this as a possibility.

I do think there is something to be said for clear separation of overloaded
and non-overloaded APIs. But f I were to choose between adding numpy.api
and numpy.array_only, I would pick numpy.api, because of the virtue of
preserving the existing numpy namespace as it currently exists.

> B. Automatic insertion by the decorator of an `array_only=np._NoValue` (or
> `coerce` and perhaps `subok=...` if not present) in the function signature,
> so that users who know that they have arrays only could pass
> `array_only=True` (name to be decided).
>

Rather than adding another argument to every NumPy function, I would rather
encourage writing np.asarray() explicitly.

> Note that both A and B could also address, at least partially, the problem
> of sometimes wanting to just use the old coercion methods, i.e., not having
> to implement every possible numpy function in one go in a new
> `__array_function__` on one's class.
>

Yes, agreed.

> 1. I'm rather unclear about the use of `types`. It can help me decide what
> to do, but I would still have to find the argument in question (e.g., for
> Quantity, the unit of the relevant argument). I'd recommend passing instead
> a tuple of all arguments that were inspected, in the inspection order;
> after all, it is just a `arg.__class__` away from the type, and in your
> example you'd only have to replace `issubclass` by `isinstance`.
>

The virtue of a `types` argument is that we can deduplicate arguments once,
rather than in each __array_function__ check. This could result in
significantly more efficient code, e.g,. when np.concatenate() is called on
10,000 arrays with only two unique types, we don't need to loop through all
10,000 again objects to check that overloading is valid.

Even for Quantity, I suspect you will want two layers of checks:
1. A check to verify that every argument is a Quantity (or something
coercible to a Quantity). This could use `types` and return
`NotImplemented` when it fails.
2. A check to verify that units match. This will have custom logic for
different operations and will require checking all arguments -- not just
their unique types.

For many Quantity functions, the second check will indeed probably be super
simple (i.e., verifying that all units match). But the first check (with
`types`) really is something that basically very overload should do.

> 2. For subclasses, it would be very handy to have
> `ndarray.__array_function__`, so one can call super after changing
> arguments. (For `__array_ufunc__`, there was lots of question about whether
> this was useful, but it really is!!). [I think you already agreed with
> this, but want to have it in-place, as for subclasses of ndarray this is
> just as useful as it would be for subclasses of dask arrays.)
>

Yes, indeed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180603/30514f06/attachment.html>