[Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

Juan Nunez-Iglesias jni at fastmail.com
Tue Apr 21 03:06:12 EDT 2020

Hello NumPy-ers!

The __array__ method is a great little tool to allow interoperability with NumPy. Briefly, calling `np.array()` or `np.asarray()` on an object with an `__array__` method, one can get a NumPy representation of that object, which may or may not involve data copying (this is up to the object’s implementation of `__array__`). Some references:

https://numpy.org/devdocs/user/basics.dispatch.html <https://numpy.org/devdocs/user/basics.dispatch.html>

https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__ <https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array__>
https://numpy.org/devdocs/reference/generated/numpy.array.html <https://numpy.org/devdocs/reference/generated/numpy.array.html>
https://numpy.org/devdocs/reference/generated/numpy.asarray.html <https://numpy.org/devdocs/reference/generated/numpy.asarray.html>

(I couldn’t find an authoritative guide on good and bad practices with `__array__`, btw.)

For people writing e.g. visualisation libraries, this is a wonderful thing, because if we know how to visualise NumPy arrays, we can suddenly visualise anything with an `__array__` method. As an example, napari, while not being aware of dask, can visualise large dask arrays out of the box, which allows us to view 100GB out-of-core datasets easily.

However, in many cases, instantiating a NumPy array is an expensive operation, for example copying an array from GPU to CPU memory, or involves substantial loss of information. Some library authors are reluctant to allow implicit execution of such an operation, such as PyOpenCL [1], PyTorch [2], or even scipy.sparse.

My proposal is to add an optional argument to `__array__` that would signal to the downstream library that we *really* want a NumPy array and are willing to wait for it. In the PyTorch issue I proposed `force=True`, and they are somewhat receptive of this, but, reading more about the existing NumPy APIs, I think `copy=True` would be a nice alternative:

- np.array already has a copy= keyword argument. Under this proposal, it would attempt to pass it to the downstream library, and, if that failed, it would try again without it and run its own copy.
- np.asarray could get a new copy= keyword argument that would match np.array’s.
- It would neatly express the idea that the array is going to e.g. get passed around between devices.

Or, we could just go with `force=`.

One bit of expressivity we would miss is “copy if necessary, but otherwise don’t bother”, but there are workarounds to this.

What do people think? I would be happy to write a PR and/or NEP for this if there is general consensus that this would be useful.




[1]: https://github.com/inducer/pyopencl/pull/301
[2]: https://github.com/pytorch/pytorch/issues/36560
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200421/d73f8146/attachment-0001.html>

More information about the NumPy-Discussion mailing list