On Mon, Aug 5, 2019 at 2:48 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:

Having __array__ give a TypeError is fine for libraries that want to prevent unintentional coercion with, e.g., `np.asarray(my_ducktype)`. However that leaves the obvious question of what the right way is to do this intentionally. Would be good to recommend something, for example a `numpy()` or `to_numpy()` method. Also, the NEP should make it clearer that this is not the obviously correct thing to do, it only makes sense in cases where coercion is very expensive, like CuPy and Sparse. For Dask for example, coercion to a numpy array is perfectly reasonable.

I agree, we need another solution for explicit array conversion, either from duck arrays to NumPy arrays or between duck arrays.

As has come-up on GitHub [1], think this should probably be another protocol, to allow for third-party conversions like sparse <-> dask that in principle could be implemented by either library.

To get discussion start, here's one possible proposal for what the NumPy API(s) could look like:

np.coerce(sparse_array) # by default, coerce to np.ndarray

np.coerce(sparse_array, dask.array.Array) # coerces to dask

np.coerce_like(sparse_array, dask_array) # coerce like the second array type

np.coerce_arrays(list_of_arrays) # coerce to first type that can handle everything

The protocol itself should probably either use __array_function__ (e.g., for np.coerce_like, if all the dispatched on arguments are arrays) or a custom protocol in the same style that allows for implementations on either the array being converted or the type of the result [2].

[1] https://github.com/numpy/numpy/issues/13831

[2] https://github.com/numpy/numpy/pull/14170#issuecomment-517004293

The NEP currently does not say who this is meant for. Would you expect libraries like SciPy to adopt it for example?

The NEP also (understandably) punts on the question of when something is a valid duck array. If you want this to be widely used, that will need an answer or at least some rough guidance though. For example, we would expect a duck array to have a mean() method, but probably not a ptp() method. A library author who wants to use np.duckarray() needs to know, because she can't test with all existing and future duck array implementations.

I think this is covered in NEP-22 already. As discussed there, I don't think NumPy is in a good position to pronounce decisive APIs at this time. I would welcome efforts to try, but I don't think that's essential for now.

An alternative to introducing np.duckarray() would be to just modify np.asarray(). Of course this has backwards compatibility impact, but if you're going to be raising a TypeError from __array__ then that impact is there anyway. Note: I don't think this is necessarily a better idea, because it may lead to less clear errors, but it's worth putting in the alternatives section at least.

Cheers,
Ralf

Would be great to get some comments on that.

[1] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0030-duck-array-protocol.rst
[2] https://github.com/numpy/numpy/pull/14170
[3] https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html

Best,
Peter
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion