[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
ralf.gommers at gmail.com
Sat Sep 7 16:33:35 EDT 2019
On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg <sebastian at sipsolutions.net>
> On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
> > > That's part of it. The concrete problems it's solving are
> > > threefold:
> > > Array creation functions can be overridden.
> > > Array coercion is now covered.
> > > "Default implementations" will allow you to re-write your NumPy
> > > array more easily, when such efficient implementations exist in
> > > terms of other NumPy functions. That will also help achieve similar
> > > semantics, but as I said, they're just "default"...
> > >
> > There may be another very concrete one (that's not yet in the NEP):
> > allowing other libraries that consume ndarrays to use overrides. An
> > example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch
> > NumPy, something we don't like all that much (in particular for
> > mkl_fft, because it's the default in Anaconda). `__array_function__`
> > isn't able to help here, because it will always choose NumPy's own
> > implementation for ndarray input. With unumpy you can support
> > multiple libraries that consume ndarrays.
> > Another example is einsum: if you want to use opt_einsum for all
> > inputs (including ndarrays), then you cannot use np.einsum. And yet
> > another is using bottleneck (
> > https://kwgoodman.github.io/bottleneck-doc/reference.html) for nan-
> > functions and partition. There's likely more of these.
> > The point is: sometimes the array protocols are preferred (e.g.
> > Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch works
> > better. It's also not necessarily an either or, they can be
> > complementary.
> Let me try to move the discussion from the github issue here (this may
> not be the best place). (https://github.com/numpy/numpy/issues/14441
> which asked for easier creation functions together with
> I think an important note mentioned here is how users interact with
> unumpy, vs. __array_function__. The former is an explicit opt-in, while
> the latter is implicit choice based on an `array-like` abstract base
> class and functional type based dispatching.
> To quote NEP 18 on this: "The downsides are that this would require an
> explicit opt-in from all existing code, e.g., import numpy.api as np,
> and in the long term would result in the maintenance of two separate
> NumPy APIs. Also, many functions from numpy itself are already
> overloaded (but inadequately), so confusion about high vs. low level
> APIs in NumPy would still persist."
> (I do think this is a point we should not just ignore, `uarray` is a
> thin layer, but it has a big surface area)
> Now there are things where explicit opt-in is obvious. And the FFT
> example is one of those, there is no way to implicitly choose another
> backend (except by just replacing it, i.e. monkeypatching) . And
> right now I think these are _very_ different.
> Now for the end-users choosing one array-like over another, seems nicer
> as an implicit mechanism (why should I not mix sparse, dask and numpy
> arrays!?). This is the promise `__array_function__` tries to make.
> Unless convinced otherwise, my guess is that most library authors would
> strive for implicit support (i.e. sklearn, skimage, scipy).
> Circling back to creation and coercion. In a purely Object type system,
> these would be classmethods, I guess, but in NumPy and the libraries
> above, we are lost.
> Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31)
> * Required end-user opt-in.
* Seems cleaner in many ways
> * Requires a full copy of the API.
bullet 1 and 3 are not required. if we decide to make it default, then
there's no separate namespace
> Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to
> create new arrays more conveniently. This would practically mean adding
> an `array_type=np.ndarray` argument.
> * _Not_ used by end-users! End users should use dask.linspace!
> * Adds "strange" API somewhere in numpy, and possible a new
> "protocol" (additionally to coercion).
> I still feel these solve different issues. The second one is intended
> to make array likes work implicitly in libraries (without end users
> having to do anything). While the first seems to force the end user to
> opt in, sometimes unnecessarily:
> def my_library_func(array_like):
> exp = np.exp(array_like)
> idx = np.arange(len(exp))
> return idx, exp
> Would have all the information for implicit opt-in/Array-like support,
> but cannot do it right now.
Can you explain this a bit more? `len(exp)` is a number, so
`np.arange(number)` doesn't really have any information here.
> This is what I have been wondering, if
> uarray/unumpy, can in some way help me make this work (even _without_
> the end user opting in).
good question. if that needs to work in the absence of the user doing
anything, it should be something like
unumpy.arange(len(exp)) # or np.arange if we make unumpy default
to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.
Note, that `determine_backend` thing doesn't exist today.
The reason is that simply, right now I am very
> clear on the need for this use case, but not sure about the need for
> end user opt in, since end users can just use dask.arange().
I don't get the last part. The arange is inside a library function, so a
user can't just go in and change things there.
>  To be honest, I do think a lot of the "issues" around
> monkeypatching exists just as much with backend choosing, the main
> difference seems to me that a lot of that:
> 1. monkeypatching was not done explicit
> (import mkl_fft; mkl_fft.monkeypatch_numpy())?
> 2. A backend system allows libaries to prefer one locally?
> (which I think is a big advantage)
>  There are the options of adding `linspace_like` functions somewhere
> in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`,
> or simply inventing a new "protocl" (which is not really a protocol?),
> and make it `ndarray.__numpy_like_creation_functions__.arange()`.
> > Actually, after writing this I just realized something. With 1.17.x
> > we have:
> > ```
> > In : import dask.array as da
> > In : d = da.from_array(np.linspace(0, 1))
> > In : np.fft.fft(d)
> > Out: dask.array<fft, shape=(50,), dtype=complex128,
> > chunksize=(50,)>
> > ```
> > In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't
> > work. We have no bug report yet because 1.17.x hasn't landed in conda
> > defaults yet (perhaps this is a/the reason why?), but it will be a
> > problem.
> > > The import numpy.overridable part is meant to help garner adoption,
> > > and to prefer the unumpy module if it is available (which will
> > > continue to be developed separately). That way it isn't so tightly
> > > coupled to the release cycle. One alternative Sebastian Berg
> > > mentioned (and I am on board with) is just moving unumpy into the
> > > NumPy organisation. What we fear keeping it separate is that the
> > > simple act of a pip install unumpy will keep people from using it
> > > or trying it out.
> > >
> > Note that this is not the most critical aspect. I pushed for
> > vendoring as numpy.overridable because I want to not derail the
> > comparison with NEP 30 et al. with a "should we add a dependency"
> > discussion. The interesting part to decide on first is: do we need
> > the unumpy override mechanism? Vendoring opt-in vs. making it default
> > vs. adding a dependency is of secondary interest right now.
> > Cheers,
> > Ralf
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion