[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
njs at pobox.com
Fri Sep 6 03:49:15 EDT 2019
On Mon, Sep 2, 2019 at 11:21 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:
> On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith <njs at pobox.com> wrote:
>> The reason this is challenging is that there's a lot of code written
>> in Cython/C/C++ that calls np.asarray,
> Cython code only perhaps? It would surprise me if there's a lot of C/C++ code that explicitly calls into our Python rather than C API.
I think there's also code written as Python-wrappers-around-C-code
where the Python layer handles the error-checking/coercion, and the C
code trusts it to have done so.
>> Now if I understand right, your proposal would be to make it so any
>> code in any package could arbitrarily change the behavior of
>> np.asarray for all inputs, e.g. I could just decide that
>> np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray
> No, definitely not! It's all opt-in, by explicitly importing from `numpy.overridable` or `unumpy`. No behavior of anything in the existing numpy namespaces should be affected in any way.
Ah, whoops, I definitely missed that :-). That does change things!
So one of the major decision points for any duck-array API work, is
whether to modify the numpy semantics "in place", so user code
automatically gets access to the new semantics, or else to make a new
namespace, that users have to switch over to manually.
The major disadvantage of doing changes "in place" is, of course, that
we have to do all this careful work to move incrementally and make
sure that we don't break things. The major (potential) advantage is
that we have a much better chance of moving the ecosystem with us.
The major advantage of making a new namespace is that it's *much*
easier to experiment, because there's no chance of breaking any
projects that didn't opt in. The major disadvantage is that numpy is
super strongly entrenched, and convincing every project to switch to
something else is incredibly difficult and costly. (I just searched
github for "import numpy" and got 17.7 million hits. That's a lot of
imports to update!) Also, empirically, we've seen multiple projects
try to do this (e.g. DyND), and so far they all failed.
It sounds like unumpy is an interesting approach that hasn't been
tried before – in particular, the promise that you can "just switch
your imports" is a much easier transition than e.g. DyND offered. Of
course, that promise is somewhat undermined by the reality that all
these potential backend libraries *aren't* 100% compatible with numpy,
and can't be... it might turn out that this ends up like asanyarray,
where you can't really use it reliably because the thing that comes
out will generally support *most* of the normal ndarray semantics, but
you don't know which part. Is scipy planning to switch to using this
everywhere, including in C code? If not, then how do you expect
projects like matplotlib to switch, given that matplotlib likes to
pass array objects into scipy functions? Are you planning to take the
opportunity to clean up some of the obscure corners of the numpy API?
But those are general questions about unumpy, and I'm guessing no-one
knows all the answers yet... and these question actually aren't super
relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main
thing the NEP is proposes is simply to make "numpy.overridable" an
alias for "unumpy".
It's not clear to me what problem this alias is solving. If all
downstream users have to update their imports anyway, then they can
write "import unumpy as np" just as easily as they can write "import
numpy.overridable as np". I guess the main reason this is a NEP is
because the unumpy project is hoping to get an "official stamp of
approval" from numpy? But even that could be accomplished by just
putting something in the docs. And adding the alias has substantial
risks: it makes unumpy tied to the numpy release cycle and
compatibility rules, and it means that we're committing to maintaining
unumpy ~forever even if Hameer or Quansight move onto other things.
That seems like a lot to take on for such vague benefits?
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <einstein.edison at gmail.com> wrote:
> The fact that we're having to design more and more protocols for a lot
> of very similar things is, to me, an indicator that we do have holistic
> problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that
they're each different :-). If it was just a matter of copying
__array_ufunc__ we'd have been done in a few minutes...
Nathaniel J. Smith -- https://vorpus.org
More information about the NumPy-Discussion