[Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface
Charles R Harris
charlesr.harris at gmail.com
Tue Apr 21 17:55:44 EDT 2020
On Tue, Apr 21, 2020 at 1:07 AM Juan Nunez-Iglesias <jni at fastmail.com>
> Hello NumPy-ers!
> The __array__ method is a great little tool to allow interoperability with
> NumPy. Briefly, calling `np.array()` or `np.asarray()` on an object with an
> `__array__` method, one can get a NumPy representation of that object,
> which may or may not involve data copying (this is up to the object’s
> implementation of `__array__`). Some references:
> (I couldn’t find an authoritative guide on good and bad practices with
> `__array__`, btw.)
> For people writing e.g. visualisation libraries, this is a wonderful
> thing, because if we know how to visualise NumPy arrays, we can suddenly
> visualise anything with an `__array__` method. As an example, napari, while
> not being aware of dask, can visualise large dask arrays out of the box,
> which allows us to view 100GB out-of-core datasets easily.
> However, in many cases, instantiating a NumPy array is an expensive
> operation, for example copying an array from GPU to CPU memory, or involves
> substantial loss of information. Some library authors are reluctant to
> allow implicit execution of such an operation, such as PyOpenCL ,
> PyTorch , or even scipy.sparse.
> My proposal is to add an optional argument to `__array__` that would
> signal to the downstream library that we *really* want a NumPy array and
> are willing to wait for it. In the PyTorch issue I proposed `force=True`,
> and they are somewhat receptive of this, but, reading more about the
> existing NumPy APIs, I think `copy=True` would be a nice alternative:
> - np.array already has a copy= keyword argument. Under this proposal, it
> would attempt to pass it to the downstream library, and, if that failed, it
> would try again without it and run its own copy.
> - np.asarray could get a new copy= keyword argument that would match
> - It would neatly express the idea that the array is going to e.g. get
> passed around between devices.
> Or, we could just go with `force=`.
> One bit of expressivity we would miss is “copy if necessary, but otherwise
> don’t bother”, but there are workarounds to this.
> What do people think? I would be happy to write a PR and/or NEP for this
> if there is general consensus that this would be useful.
This sounds like the sort of thing that is use case driven. If enough
projects want to use it, then I have no objections to adding the keyword.
OTOH, we need to be careful about adding too many interoperability tricks
as they complicate the code and makes it hard for folks to determine the
best solution. Interoperability is a hot topic and we need to be careful
not put too leave behind too many experiments in the NumPy code. Do you
have any other ideas of how to achieve the same effect?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion