Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

April 28, 2019

      On Sat, Apr 27, 2019 at 4:38 PM Hameer Abbasi <einstein.edison@gmail.com>
wrote:
...
Of course, here’s my proposal:
We leave NEP-18 as-is for now, and instead of writing separate protocols
for coercion, dtypes and ufuncs (which will be needed somewhere down the
line), we have a discussion about uarray and see if it can help there. :-)
At a very high level, I don't understand yet how uarray is the kind of
thing that could even potentially help, so maybe that's something that
would be helpful to dig into.

To me, the major challenges in supporting duck arrays in numpy are all
about the economics of compatibility – how can third-party libraries
support:

- as broad a range of functionality as possible,
- with the highest possible amount of compatibility,
- at a reasonable implementation cost,
- and maintain compatibility over time,
- given numpy's complex pre-existing API and backwards compatibility
commitments.

My impression so far is that uarray has a generic multi-method dispatch
system, and an independent implementation of some of numpy's functionality.
Those are both cool things, but I don't see how they're relevant to the
list of challenges above.

For the simple strategy of simply letting third-party libraries "take over"
dispatch and insert their own implementations, __array_function__ basically
covers that. A multi-method system could do the same thing, but not in a
materially different way – it's basically two different coats of paint on
the same underlying idea.

The challenge for __array_function__ is that because it's a simple
black-box dispatch system wrapped around the whole library, there's no
simple way for third-parties to reuse numpy's code, which creates
challenges for compatibility, cost, maintenance, etc. Maybe that will turn
out to be a showstopper, maybe not – we can each make our guesses, but
since we're trying the experiment then we'll know soon enough :-).

If it does turn out to be a showstopper, then the question will become: how
do we provide finer-grained protocols that are deeply integrated into
numpy's semantics? This can help address those questions above, because (a)
it's a narrower set of APIs for implementors to worry about, so
implementing will require less resources, (b) they're more precisely
defined, so getting the details right is easier, (c) you get more re-use of
numpy code in between calls to the protocols, so you automatically get
numpy's bug fixes.

But... doing this is hard because it really requires us to dig into the
details of numpy's semantics. You mention a protocol for ufuncs – we
already have that? And it took years, and an immense amount of discussion,
because the integration details were genuinely super complicated (e.g., the
famous thread about how to handle dispatch when there were both __add__ and
__array_ufunc__ methods on the same object). Just saying "we'll use
multimethods" doesn't tell you how + and np.add should interoperate.

Similarly, an array coercion protocol itself is trivial – "it's called
__asduckarray__, it works like __array__ but can return a duckarray",
there, done! The hard part is stuff like: okay, but which functions invoke
this – array, asarray, implicit coercion in ufuncs? under which
circumstances, and what are the consequences for compatibility and
deployability? what does asfortranarray do, do fortran arrays even make
sense for duck arrays? well, probably not, but compatibility means we need
to do something; what does existing code expect? etc. etc. How does uarray
help us solve these problems?

I don't know what a dtype protocol is. I don't think we want to support
dispatching over dtype objects, at least in any of the senses I'm thinking
of. But that could mean a lot of things so maybe I'm missing something.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org <http://vorpus.org>

Re: [Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

Nathaniel Smith