On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
>
>
<snip>

> > That's part of it. The concrete problems it's solving are
> > threefold:
> > Array creation functions can be overridden.
> > Array coercion is now covered.
> > "Default implementations" will allow you to re-write your NumPy
> > array more easily, when such efficient implementations exist in
> > terms of other NumPy functions. That will also help achieve similar
> > semantics, but as I said, they're just "default"...
> >
>
> There may be another very concrete one (that's not yet in the NEP):
> allowing other libraries that consume ndarrays to use overrides. An
> example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch
> NumPy, something we don't like all that much (in particular for
> mkl_fft, because it's the default in Anaconda). `__array_function__`
> isn't able to help here, because it will always choose NumPy's own
> implementation for ndarray input. With unumpy you can support
> multiple libraries that consume ndarrays.
>
> Another example is einsum: if you want to use opt_einsum for all
> inputs (including ndarrays), then you cannot use np.einsum. And yet
> another is using bottleneck (
> https://kwgoodman.github.io/bottleneck-doc/reference.html) for nan-
> functions and partition. There's likely more of these.

> The point is: sometimes the array protocols are preferred (e.g.
> Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch works
> better. It's also not necessarily an either or, they can be
> complementary.
>

Let me try to move the discussion from the github issue here (this may
not be the best place). (https://github.com/numpy/numpy/issues/14441
which asked for easier creation functions together with `__array_function__`).

I think an important note mentioned here is how users interact with
unumpy, vs. __array_function__. The former is an explicit opt-in, while
the latter is implicit choice based on an `array-like` abstract base
class and functional type based dispatching.

To quote NEP 18 on this: "The downsides are that this would require an
explicit opt-in from all existing code, e.g., import numpy.api as np,
and in the long term would result in the maintenance of two separate
NumPy APIs. Also, many functions from numpy itself are already
overloaded (but inadequately), so confusion about high vs. low level
APIs in NumPy would still persist."
(I do think this is a point we should not just ignore, `uarray` is a
thin layer, but it has a big surface area)

Now there are things where explicit opt-in is obvious. And the FFT
example is one of those, there is no way to implicitly choose another
backend (except by just replacing it, i.e. monkeypatching) [1]. And
right now I think these are _very_ different.


Now for the end-users choosing one array-like over another, seems nicer
as an implicit mechanism (why should I not mix sparse, dask and numpy
arrays!?). This is the promise `__array_function__` tries to make.
Unless convinced otherwise, my guess is that most library authors would
strive for implicit support (i.e. sklearn, skimage, scipy).

Circling back to creation and coercion. In a purely Object type system,
these would be classmethods, I guess, but in NumPy and the libraries
above, we are lost.

Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31)
  * Required end-user opt-in.
  * Seems cleaner in many ways
  * Requires a full copy of the API.

bullet 1 and 3 are not required. if we decide to make it default, then there's no separate namespace


Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to
create new arrays more conveniently. This would practically mean adding
an `array_type=np.ndarray` argument.
  * _Not_ used by end-users! End users should use dask.linspace!
  * Adds "strange" API somewhere in numpy, and possible a new
    "protocol" (additionally to coercion).[2]

I still feel these solve different issues. The second one is intended
to make array likes work implicitly in libraries (without end users
having to do anything). While the first seems to force the end user to
opt in, sometimes unnecessarily:

def my_library_func(array_like):
   exp = np.exp(array_like)
   idx = np.arange(len(exp))
   return idx, exp

Would have all the information for implicit opt-in/Array-like support,
but cannot do it right now.

Can you explain this a bit more? `len(exp)` is a number, so `np.arange(number)` doesn't really have any information here.


 
This is what I have been wondering, if
uarray/unumpy, can in some way help me make this work (even _without_
the end user opting in).

good question. if that needs to work in the absence of the user doing anything, it should be something like

with unumpy.determine_backend(exp):
   unumpy.arange(len(exp))   # or np.arange if we make unumpy default

to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.

Note, that `determine_backend` thing doesn't exist today.

The reason is that simply, right now I am very
clear on the need for this use case, but not sure about the need for
end user opt in, since end users can just use dask.arange().

I don't get the last part. The arange is inside a library function, so a user can't just go in and change things there.

Cheers,
Ralf

 

Cheers,

Sebastian


[1] To be honest, I do think a lot of the "issues" around
monkeypatching exists just as much with backend choosing, the main
difference seems to me that a lot of that:
   1. monkeypatching was not done explicit
      (import mkl_fft; mkl_fft.monkeypatch_numpy())?
   2. A backend system allows libaries to prefer one locally?
      (which I think is a big advantage)

[2] There are the options of adding `linspace_like` functions somewhere
in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`,
or simply inventing a new "protocl" (which is not really a protocol?),
and make it `ndarray.__numpy_like_creation_functions__.arange()`.



> Actually, after writing this I just realized something. With 1.17.x
> we have:
>
> ```
> In [1]: import dask.array as da                                     
>             
>
> In [2]: d = da.from_array(np.linspace(0, 1))                       
>               
>
> In [3]: np.fft.fft(d)                                               
>             
> Out[3]: dask.array<fft, shape=(50,), dtype=complex128,
> chunksize=(50,)>
> ```
>
> In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't
> work. We have no bug report yet because 1.17.x hasn't landed in conda
> defaults yet (perhaps this is a/the reason why?), but it will be a
> problem.
>
> > The import numpy.overridable part is meant to help garner adoption,
> > and to prefer the unumpy module if it is available (which will
> > continue to be developed separately). That way it isn't so tightly
> > coupled to the release cycle. One alternative Sebastian Berg
> > mentioned (and I am on board with) is just moving unumpy into the
> > NumPy organisation. What we fear keeping it separate is that the
> > simple act of a pip install unumpy will keep people from using it
> > or trying it out.
> >
> Note that this is not the most critical aspect. I pushed for
> vendoring as numpy.overridable because I want to not derail the
> comparison with NEP 30 et al. with a "should we add a dependency"
> discussion. The interesting part to decide on first is: do we need
> the unumpy override mechanism? Vendoring opt-in vs. making it default
> vs. adding a dependency is of secondary interest right now.
>
> Cheers,
> Ralf
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion