[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

sebastian sebastian at sipsolutions.net
Sat Sep 7 17:17:57 EDT 2019

On 2019-09-07 15:33, Ralf Gommers wrote:
> On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
>> On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
>> <snip>
>>>> That's part of it. The concrete problems it's solving are
>>>> threefold:
>>>> Array creation functions can be overridden.
>>>> Array coercion is now covered.
>>>> "Default implementations" will allow you to re-write your NumPy
>>>> array more easily, when such efficient implementations exist in
>>>> terms of other NumPy functions. That will also help achieve
>> similar
>>>> semantics, but as I said, they're just "default"...
>>> There may be another very concrete one (that's not yet in the
>> NEP):
>>> allowing other libraries that consume ndarrays to use overrides.
>> An
>>> example is numpy.fft: currently both mkl_fft and pyfftw
>> monkeypatch
>>> NumPy, something we don't like all that much (in particular for
>>> mkl_fft, because it's the default in Anaconda).
>> `__array_function__`
>>> isn't able to help here, because it will always choose NumPy's own
>>> implementation for ndarray input. With unumpy you can support
>>> multiple libraries that consume ndarrays.
>>> Another example is einsum: if you want to use opt_einsum for all
>>> inputs (including ndarrays), then you cannot use np.einsum. And
>> yet
>>> another is using bottleneck (
>>> https://kwgoodman.github.io/bottleneck-doc/reference.html) for
>> nan-
>>> functions and partition. There's likely more of these.
>>> The point is: sometimes the array protocols are preferred (e.g.
>>> Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch
>> works
>>> better. It's also not necessarily an either or, they can be
>>> complementary.
>> Let me try to move the discussion from the github issue here (this
>> may
>> not be the best place). (https://github.com/numpy/numpy/issues/14441
>> which asked for easier creation functions together with
>> `__array_function__`).
>> I think an important note mentioned here is how users interact with
>> unumpy, vs. __array_function__. The former is an explicit opt-in,
>> while
>> the latter is implicit choice based on an `array-like` abstract base
>> class and functional type based dispatching.
>> To quote NEP 18 on this: "The downsides are that this would require
>> an
>> explicit opt-in from all existing code, e.g., import numpy.api as
>> np,
>> and in the long term would result in the maintenance of two separate
>> NumPy APIs. Also, many functions from numpy itself are already
>> overloaded (but inadequately), so confusion about high vs. low level
>> APIs in NumPy would still persist."
>> (I do think this is a point we should not just ignore, `uarray` is a
>> thin layer, but it has a big surface area)
>> Now there are things where explicit opt-in is obvious. And the FFT
>> example is one of those, there is no way to implicitly choose
>> another
>> backend (except by just replacing it, i.e. monkeypatching) [1]. And
>> right now I think these are _very_ different.
>> Now for the end-users choosing one array-like over another, seems
>> nicer
>> as an implicit mechanism (why should I not mix sparse, dask and
>> numpy
>> arrays!?). This is the promise `__array_function__` tries to make.
>> Unless convinced otherwise, my guess is that most library authors
>> would
>> strive for implicit support (i.e. sklearn, skimage, scipy).
>> Circling back to creation and coercion. In a purely Object type
>> system,
>> these would be classmethods, I guess, but in NumPy and the libraries
>> above, we are lost.
>> Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31)
>> * Required end-user opt-in.
>> * Seems cleaner in many ways
>> * Requires a full copy of the API.
> bullet 1 and 3 are not required. if we decide to make it default, then
> there's no separate namespace

It does require explicit opt-in to have any benefits to the user.

>> Solution 2: Add some coercion "protocol" (NEP-30) and expose a way
>> to
>> create new arrays more conveniently. This would practically mean
>> adding
>> an `array_type=np.ndarray` argument.
>> * _Not_ used by end-users! End users should use dask.linspace!
>> * Adds "strange" API somewhere in numpy, and possible a new
>> "protocol" (additionally to coercion).[2]
>> I still feel these solve different issues. The second one is
>> intended
>> to make array likes work implicitly in libraries (without end users
>> having to do anything). While the first seems to force the end user
>> to
>> opt in, sometimes unnecessarily:
>> def my_library_func(array_like):
>> exp = np.exp(array_like)
>> idx = np.arange(len(exp))
>> return idx, exp
>> Would have all the information for implicit opt-in/Array-like
>> support,
>> but cannot do it right now.
> Can you explain this a bit more? `len(exp)` is a number, so
> `np.arange(number)` doesn't really have any information here.

Right, but as a library author, I want a way a way to make it use the 
same type as `array_like` in this particular function, that is the 
point! The end-user already signaled they prefer say dask, due to the 
array that was actually passed in. (but this is just repeating what is 
below I think).

>> This is what I have been wondering, if
>> uarray/unumpy, can in some way help me make this work (even
>> _without_
>> the end user opting in).
> good question. if that needs to work in the absence of the user doing
> anything, it should be something like
> with unumpy.determine_backend(exp):
>    unumpy.arange(len(exp))   # or np.arange if we make unumpy default
> to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.
> Note, that `determine_backend` thing doesn't exist today.

Exactly, that is what I have been wondering about, there may be more 
issues around that.
If it existed, we may be able to solve the implicit library usage by 
making libraries use
unumpy (or similar). Although, at that point we half replace 
`__array_function__` maybe.
However, the main point is that without such a functionality, NEP 30 and 
NEP 31 seem to solve slightly
different issues with respect to how they interact with the end-user 
(opt in)?

We may decide that we do not want to solve the library users issue of 
wanting to support implicit
opt-in for array like inputs because it is a rabbit hole. But we may 
need to discuss/argue a bit
more that it really is a deep enough rabbit hole that it is not worth 
the trouble.

>> The reason is that simply, right now I am very
>> clear on the need for this use case, but not sure about the need for
>> end user opt in, since end users can just use dask.arange().
> I don't get the last part. The arange is inside a library function, so
> a user can't just go in and change things there.

A "user" here means "end user". An end user writes a script, and they 
can easily change
`arr = np.linspace(10)` to `arr = dask.linspace(10)`, or more likely 
just use one within one
script and the other within another script, while both use the same 
sklearn functions.
(Although using a backend switching may be nicer in some contexts)

A library provider (library user of unumpy/numpy) of course cannot just 
use dask conveniently,
unless they write their own `guess_numpy_like_module()` function first.

> Cheers,
> Ralf
>> Cheers,
>> Sebastian
>> [1] To be honest, I do think a lot of the "issues" around
>> monkeypatching exists just as much with backend choosing, the main
>> difference seems to me that a lot of that:
>> 1. monkeypatching was not done explicit
>> (import mkl_fft; mkl_fft.monkeypatch_numpy())?
>> 2. A backend system allows libaries to prefer one locally?
>> (which I think is a big advantage)
>> [2] There are the options of adding `linspace_like` functions
>> somewhere
>> in a numpy submodule, or adding `linspace(...,
>> array_type=np.ndarray)`,
>> or simply inventing a new "protocl" (which is not really a
>> protocol?),
>> and make it `ndarray.__numpy_like_creation_functions__.arange()`.
>>> Actually, after writing this I just realized something. With
>> 1.17.x
>>> we have:
>>> ```
>>> In [1]: import dask.array as da
>>> In [2]: d = da.from_array(np.linspace(0, 1))
>>> In [3]: np.fft.fft(d)
>>> Out[3]: dask.array<fft, shape=(50,), dtype=complex128,
>>> chunksize=(50,)>
>>> ```
>>> In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this
>> won't
>>> work. We have no bug report yet because 1.17.x hasn't landed in
>> conda
>>> defaults yet (perhaps this is a/the reason why?), but it will be a
>>> problem.
>>>> The import numpy.overridable part is meant to help garner
>> adoption,
>>>> and to prefer the unumpy module if it is available (which will
>>>> continue to be developed separately). That way it isn't so
>> tightly
>>>> coupled to the release cycle. One alternative Sebastian Berg
>>>> mentioned (and I am on board with) is just moving unumpy into
>> the
>>>> NumPy organisation. What we fear keeping it separate is that the
>>>> simple act of a pip install unumpy will keep people from using
>> it
>>>> or trying it out.
>>> Note that this is not the most critical aspect. I pushed for
>>> vendoring as numpy.overridable because I want to not derail the
>>> comparison with NEP 30 et al. with a "should we add a dependency"
>>> discussion. The interesting part to decide on first is: do we need
>>> the unumpy override mechanism? Vendoring opt-in vs. making it
>> default
>>> vs. adding a dependency is of secondary interest right now.
>>> Cheers,
>>> Ralf
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list