[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

Ralf Gommers ralf.gommers at gmail.com
Sat Sep 7 17:49:07 EDT 2019


On Sat, Sep 7, 2019 at 2:18 PM sebastian <sebastian at sipsolutions.net> wrote:

> On 2019-09-07 15:33, Ralf Gommers wrote:
> > On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg
> > <sebastian at sipsolutions.net> wrote:
> >
> >> On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
> >>>
> >>>
> >> <snip>
> >>
> >>>> That's part of it. The concrete problems it's solving are
> >>>> threefold:
> >>>> Array creation functions can be overridden.
> >>>> Array coercion is now covered.
> >>>> "Default implementations" will allow you to re-write your NumPy
> >>>> array more easily, when such efficient implementations exist in
> >>>> terms of other NumPy functions. That will also help achieve
> >> similar
> >>>> semantics, but as I said, they're just "default"...
> >>>>
> >>>
> >>> There may be another very concrete one (that's not yet in the
> >> NEP):
> >>> allowing other libraries that consume ndarrays to use overrides.
> >> An
> >>> example is numpy.fft: currently both mkl_fft and pyfftw
> >> monkeypatch
> >>> NumPy, something we don't like all that much (in particular for
> >>> mkl_fft, because it's the default in Anaconda).
> >> `__array_function__`
> >>> isn't able to help here, because it will always choose NumPy's own
> >>> implementation for ndarray input. With unumpy you can support
> >>> multiple libraries that consume ndarrays.
> >>>
> >>> Another example is einsum: if you want to use opt_einsum for all
> >>> inputs (including ndarrays), then you cannot use np.einsum. And
> >> yet
> >>> another is using bottleneck (
> >>> https://kwgoodman.github.io/bottleneck-doc/reference.html) for
> >> nan-
> >>> functions and partition. There's likely more of these.
> >>>
> >>> The point is: sometimes the array protocols are preferred (e.g.
> >>> Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch
> >> works
> >>> better. It's also not necessarily an either or, they can be
> >>> complementary.
> >>>
> >>
> >> Let me try to move the discussion from the github issue here (this
> >> may
> >> not be the best place). (https://github.com/numpy/numpy/issues/14441
> >> which asked for easier creation functions together with
> >> `__array_function__`).
> >>
> >> I think an important note mentioned here is how users interact with
> >> unumpy, vs. __array_function__. The former is an explicit opt-in,
> >> while
> >> the latter is implicit choice based on an `array-like` abstract base
> >> class and functional type based dispatching.
> >>
> >> To quote NEP 18 on this: "The downsides are that this would require
> >> an
> >> explicit opt-in from all existing code, e.g., import numpy.api as
> >> np,
> >> and in the long term would result in the maintenance of two separate
> >> NumPy APIs. Also, many functions from numpy itself are already
> >> overloaded (but inadequately), so confusion about high vs. low level
> >> APIs in NumPy would still persist."
> >> (I do think this is a point we should not just ignore, `uarray` is a
> >> thin layer, but it has a big surface area)
> >>
> >> Now there are things where explicit opt-in is obvious. And the FFT
> >> example is one of those, there is no way to implicitly choose
> >> another
> >> backend (except by just replacing it, i.e. monkeypatching) [1]. And
> >> right now I think these are _very_ different.
> >>
> >> Now for the end-users choosing one array-like over another, seems
> >> nicer
> >> as an implicit mechanism (why should I not mix sparse, dask and
> >> numpy
> >> arrays!?). This is the promise `__array_function__` tries to make.
> >> Unless convinced otherwise, my guess is that most library authors
> >> would
> >> strive for implicit support (i.e. sklearn, skimage, scipy).
> >>
> >> Circling back to creation and coercion. In a purely Object type
> >> system,
> >> these would be classmethods, I guess, but in NumPy and the libraries
> >> above, we are lost.
> >>
> >> Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31)
> >> * Required end-user opt-in.
> >
> >> * Seems cleaner in many ways
> >> * Requires a full copy of the API.
> >
> > bullet 1 and 3 are not required. if we decide to make it default, then
> > there's no separate namespace
>
> It does require explicit opt-in to have any benefits to the user.
>
> >
> >> Solution 2: Add some coercion "protocol" (NEP-30) and expose a way
> >> to
> >> create new arrays more conveniently. This would practically mean
> >> adding
> >> an `array_type=np.ndarray` argument.
> >> * _Not_ used by end-users! End users should use dask.linspace!
> >> * Adds "strange" API somewhere in numpy, and possible a new
> >> "protocol" (additionally to coercion).[2]
> >>
> >> I still feel these solve different issues. The second one is
> >> intended
> >> to make array likes work implicitly in libraries (without end users
> >> having to do anything). While the first seems to force the end user
> >> to
> >> opt in, sometimes unnecessarily:
> >>
> >> def my_library_func(array_like):
> >> exp = np.exp(array_like)
> >> idx = np.arange(len(exp))
> >> return idx, exp
> >>
> >> Would have all the information for implicit opt-in/Array-like
> >> support,
> >> but cannot do it right now.
> >
> > Can you explain this a bit more? `len(exp)` is a number, so
> > `np.arange(number)` doesn't really have any information here.
> >
>
> Right, but as a library author, I want a way a way to make it use the
> same type as `array_like` in this particular function, that is the
> point! The end-user already signaled they prefer say dask, due to the
> array that was actually passed in. (but this is just repeating what is
> below I think).
>

Okay, you meant conceptually:)


> >> This is what I have been wondering, if
> >> uarray/unumpy, can in some way help me make this work (even
> >> _without_
> >> the end user opting in).
> >
> > good question. if that needs to work in the absence of the user doing
> > anything, it should be something like
> >
> > with unumpy.determine_backend(exp):
> >    unumpy.arange(len(exp))   # or np.arange if we make unumpy default
> >
> > to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.
> >
> > Note, that `determine_backend` thing doesn't exist today.
> >
>
> Exactly, that is what I have been wondering about, there may be more
> issues around that.
> If it existed, we may be able to solve the implicit library usage by
> making libraries use
> unumpy (or similar). Although, at that point we half replace
> `__array_function__` maybe.
>

I don't really think so. Libraries can/will still use __array_function__
for most functionality, and just add a `with determine_backend` for the
places where __array_function__ doesn't work.


> However, the main point is that without such a functionality, NEP 30 and
> NEP 31 seem to solve slightly
> different issues with respect to how they interact with the end-user
> (opt in)?
>

Yes, I agree with that.

Cheers,
Ralf



>
> We may decide that we do not want to solve the library users issue of
> wanting to support implicit
> opt-in for array like inputs because it is a rabbit hole. But we may
> need to discuss/argue a bit
> more that it really is a deep enough rabbit hole that it is not worth
> the trouble.
>
> >> The reason is that simply, right now I am very
> >> clear on the need for this use case, but not sure about the need for
> >> end user opt in, since end users can just use dask.arange().
> >
> > I don't get the last part. The arange is inside a library function, so
> > a user can't just go in and change things there.
>
> A "user" here means "end user". An end user writes a script, and they
> can easily change
> `arr = np.linspace(10)` to `arr = dask.linspace(10)`, or more likely
> just use one within one
> script and the other within another script, while both use the same
> sklearn functions.
> (Although using a backend switching may be nicer in some contexts)
>
> A library provider (library user of unumpy/numpy) of course cannot just
> use dask conveniently,
> unless they write their own `guess_numpy_like_module()` function first.
>
>
> > Cheers,
> >
> > Ralf
> >
> >> Cheers,
> >>
> >> Sebastian
> >>
> >> [1] To be honest, I do think a lot of the "issues" around
> >> monkeypatching exists just as much with backend choosing, the main
> >> difference seems to me that a lot of that:
> >> 1. monkeypatching was not done explicit
> >> (import mkl_fft; mkl_fft.monkeypatch_numpy())?
> >> 2. A backend system allows libaries to prefer one locally?
> >> (which I think is a big advantage)
> >>
> >> [2] There are the options of adding `linspace_like` functions
> >> somewhere
> >> in a numpy submodule, or adding `linspace(...,
> >> array_type=np.ndarray)`,
> >> or simply inventing a new "protocl" (which is not really a
> >> protocol?),
> >> and make it `ndarray.__numpy_like_creation_functions__.arange()`.
> >>
> >>> Actually, after writing this I just realized something. With
> >> 1.17.x
> >>> we have:
> >>>
> >>> ```
> >>> In [1]: import dask.array as da
> >>
> >>>
> >>>
> >>> In [2]: d = da.from_array(np.linspace(0, 1))
> >>
> >>>
> >>>
> >>> In [3]: np.fft.fft(d)
> >>
> >>>
> >>> Out[3]: dask.array<fft, shape=(50,), dtype=complex128,
> >>> chunksize=(50,)>
> >>> ```
> >>>
> >>> In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this
> >> won't
> >>> work. We have no bug report yet because 1.17.x hasn't landed in
> >> conda
> >>> defaults yet (perhaps this is a/the reason why?), but it will be a
> >>> problem.
> >>>
> >>>> The import numpy.overridable part is meant to help garner
> >> adoption,
> >>>> and to prefer the unumpy module if it is available (which will
> >>>> continue to be developed separately). That way it isn't so
> >> tightly
> >>>> coupled to the release cycle. One alternative Sebastian Berg
> >>>> mentioned (and I am on board with) is just moving unumpy into
> >> the
> >>>> NumPy organisation. What we fear keeping it separate is that the
> >>>> simple act of a pip install unumpy will keep people from using
> >> it
> >>>> or trying it out.
> >>>>
> >>> Note that this is not the most critical aspect. I pushed for
> >>> vendoring as numpy.overridable because I want to not derail the
> >>> comparison with NEP 30 et al. with a "should we add a dependency"
> >>> discussion. The interesting part to decide on first is: do we need
> >>> the unumpy override mechanism? Vendoring opt-in vs. making it
> >> default
> >>> vs. adding a dependency is of secondary interest right now.
> >>>
> >>> Cheers,
> >>> Ralf
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> NumPy-Discussion mailing list
> >>> NumPy-Discussion at python.org
> >>> https://mail.python.org/mailman/listinfo/numpy-discussion
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at python.org
> >> https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190907/0347a559/attachment-0001.html>


More information about the NumPy-Discussion mailing list