On Sat, Apr 27, 2019 at 4:38 PM Hameer Abbasi <einstein.edison@gmail.com> wrote:
Of course, here’s my proposal:

We leave NEP-18 as-is for now, and instead of writing separate protocols for coercion, dtypes and ufuncs (which will be needed somewhere down the line), we have a discussion about uarray and see if it can help there. :-)

At a very high level, I don't understand yet how uarray is the kind of thing that could even potentially help, so maybe that's something that would be helpful to dig into.

To me, the major challenges in supporting duck arrays in numpy are all about the economics of compatibility – how can third-party libraries support:

- as broad a range of functionality as possible,
- with the highest possible amount of compatibility,
- at a reasonable implementation cost,
- and maintain compatibility over time,
- given numpy's complex pre-existing API and backwards compatibility commitments.

My impression so far is that uarray has a generic multi-method dispatch system, and an independent implementation of some of numpy's functionality. Those are both cool things, but I don't see how they're relevant to the list of challenges above.

For the simple strategy of simply letting third-party libraries "take over" dispatch and insert their own implementations, __array_function__ basically covers that. A multi-method system could do the same thing, but not in a materially different way – it's basically two different coats of paint on the same underlying idea.

The challenge for __array_function__ is that because it's a simple black-box dispatch system wrapped around the whole library, there's no simple way for third-parties to reuse numpy's code, which creates challenges for compatibility, cost, maintenance, etc. Maybe that will turn out to be a showstopper, maybe not – we can each make our guesses, but since we're trying the experiment then we'll know soon enough :-).

If it does turn out to be a showstopper, then the question will become: how do we provide finer-grained protocols that are deeply integrated into numpy's semantics? This can help address those questions above, because (a) it's a narrower set of APIs for implementors to worry about, so implementing will require less resources, (b) they're more precisely defined, so getting the details right is easier, (c) you get more re-use of numpy code in between calls to the protocols, so you automatically get numpy's bug fixes.

But... doing this is hard because it really requires us to dig into the details of numpy's semantics. You mention a protocol for ufuncs – we already have that? And it took years, and an immense amount of discussion, because the integration details were genuinely super complicated (e.g., the famous thread about how to handle dispatch when there were both __add__ and __array_ufunc__ methods on the same object). Just saying "we'll use multimethods" doesn't tell you how + and np.add should interoperate.

Similarly, an array coercion protocol itself is trivial – "it's called __asduckarray__, it works like __array__ but can return a duckarray", there, done! The hard part is stuff like: okay, but which functions invoke this – array, asarray, implicit coercion in ufuncs? under which circumstances, and what are the consequences for compatibility and deployability? what does asfortranarray do, do fortran arrays even make sense for duck arrays? well, probably not, but compatibility means we need to do something; what does existing code expect? etc. etc. How does uarray help us solve these problems?

I don't know what a dtype protocol is. I don't think we want to support dispatching over dtype objects, at least in any of the senses I'm thinking of. But that could mean a lot of things so maybe I'm missing something.


Nathaniel J. Smith -- https://vorpus.org