On Thu, Apr 25, 2019 at 10:10 AM Stephan Hoyer <shoyer@gmail.com> wrote:
On Wed, Apr 24, 2019 at 9:56 PM Nathaniel Smith <njs@pobox.com> wrote:
When you say "numpy array specific" and "__numpy_(nd)array_implementation__", that sounds to me like you're trying to say "just step 3, skipping steps 1 and 2"? Step 3 is the one that operates on ndarrays...
My thinking was that if we implement NumPy functions with duck typing (e.g., `np.stack()` in terms of `.shape` + `np.concatenate()`), then step (3) could in some sense be the generic "array implementation", not only for NumPy arrays.
Okay right, so roughly speaking there are two different types of functions that support __array_function__: * "Core" numpy functions that typically do implicit coercion and then iterate over raw memory * "Derived" functions, the kind of thing that could just as well be implemented in another library or end-user code, and often are... but since these ones happen to be in the numpy package namespace, they support __array_function__. There are probably some weird cases that don't fall neatly into either category, but I think the distinction is at least useful for organizing our thoughts.
When we have some kind of __asduckarray__ coercion, then that will complicate things too, because presumably we'll do something like
1. __array_function__ dispatch 2. __asduckarray__ coercion 3. __array_function__ dispatch again 4. ndarray coercion 5. [either "the implementation", or __array_function__ dispatch again, depending on how you want to think about it]
I was thinking of something a little simpler: do __asduckarray__ rather than numpy.ndarray coercion inside the implementation of NumPy functions. Then making use of NumPy's implementations would be a matter of calling the NumPy implementation without ndarray coercion from side __array_function__.
e.g.,
class MyArray: def __duck_array__(self): return self def __array_function__(self, func, types, args, kwargs): ... if func in {np.stack, np.atleast_1d, ...}: # use NumPy's "duck typing" implementations for these functions return func.__duck_array_implementation__(*args, **kwargs) elif func == np.concatenate: # write my own version of np.concatenate ...
This would let you make use of duck typing in a controlled way if you use __array_function__. np.stack.__duck_array_implementation__ would look exactly like np.stack, except np.asanyarray() would be replaced by np.asduckarray().
The reason why we need the separate __duck_array_implementation__ and __numpy_array_implementation__/__skipping_array_function__ is because there are also use cases where you *don't* want to worry about how np.stack is implemented under the hood (i.e., in terms of np.concatenate), and want to go straight to the coercive numpy.ndarray implementation. This lets you avoid both the complexity and overhead associated with further dispatch checks.
I don't think we want repeated dispatching with __array_function__. That seems like a recipe for slow performance and confusion.
I don't understand this part, but it makes me worry that instead of designing something that fits together based on some underlying logical framework, you're hoping to just keep throwing more and more hooks at things and hoping that if 3rd party libraries have enough hooks they'll be able to somehow monkeypatch things into working most of the time if you don't look too hard :-/. I hope that's wrong. Stepping back a bit: My objection to the phrase "numpy implemention" has been that "implementation" is one of those words like "low level", whose meaning completely changes depending on which part of the system you happen to be thinking about when you say it. I think I see what you're getting at now, though; you've been working on adding __array_function__ dispatch, and from the perspective of a wrapper function implementing __array_function__ dispatch, there's a clear distinction between the caller, the dispatch, and then the fallback "implementation" that it delegates to if no __array_function__ methods were found. The wrapper treats the fallback function like a black box. That's an internally consistent approach, and if you want __array_function__ to work on "derived" functions like np.stack... well, they're just arbitrary Python functions, so you *have* to treat the fallback like a black box, and __array_function__ dispatch as a cleanly decoupled step. And if that's the model for __array_function__, then it makes perfect sense to talk about skipping the __array_function__ dispatch step. I think the word "implementation" is too vague, but the idea makes sense. The thing I didn't realize until these last few posts, though, is that if this is the model for __array_function__, then it means you *have* to treat the fallback as a black box. Which means that __array_function__ cannot be integrated into numpy's coercion rules, which are inside the black box. And duck arrays need to be integrated into numpy's coercion rules, because you have to be able to coerce to a duck array before calling whatever special methods it has. So therefore... duck arrays cannot use __array_function__? That seems like an unfortunate conclusion but I don't see any way around it. Like, for a concrete example: if obj1 has an __asduckarray__ method, and that returns obj2 with __array_ufunc__, then I would absolutely expect np.sin(obj1) to end up calling obj2.__array_ufunc__. But if __array_function__ is a decoupled step applicable to arbitrary functions, then np.sin(obj1) can't call obj2.__array_function__. Alternatively, we could make __array_function__ part of numpy's standard coercion/dispatch sequence, but then it doesn't make much sense for np.stack to do __array_function__ dispatch. I guess this is just another manifestion trade-off we accepted when we decided to implement __array_function__, instead of more finer-grained, semantically-integrated hooks like __array_concatenate__, and I shouldn't expect __array_function__ to be useful for duck arrays? I don't have a conclusion but I'd like to know what you think about the above :-). -n -- Nathaniel J. Smith -- https://vorpus.org