Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `array` interface

29 Apr 2020

      On Wed, 2020-04-29 at 05:26 -0500, Juan Nunez-Iglesias wrote:
...
Hi everyone, and thank you Ralf for carrying the flag in my absence.
=D
Sebastian, the *primary* motivation behind avoiding detach() in
PyTorch is listed in original post of the PyTorch issue:
...
People not very familiar with `requires_grad` and cpu/gpu Tensors
might go back and forth with numpy. For example doing pytorch ->
numpy -> pytorch and backward on the last Tensor. This will
backward without issue but not all the way to the first part of the
code and won’t raise any error.
The PyTorch team are concerned that they will be overwhelmed with
help requests if np.array() silently succeeds on a tensor with
gradients. I definitely get that.
Sorry for playing advocatus diaboli...

I guess it is simply that before the end, it would be nice to have a
short list with projects:

* Napari, matplotlib on the "user" side
* PyTorch, ...? on the "provider" side

And maybe what their expectations on `force=True` are, to make sure
they roughly align.

The best definition for when to use `force=True` at this time seems to
be "end-point" users (such as visualization or maybe writing to disk?).

I still think performance can be just as valid of an issue there. For
example it may be better to convert to a numpy array earlier in the
computation.  Or someone could be surprised that saving their gpu array
to an hdf5 file is by far the slowest part of the computation.

Maybe I have the feeling the definition we want is actually:

   There is definitely no way to do this computation faster or better
   than by converting it to a NumPy array.

Since currently the main reason to reject it seems a bit to be:

   Wait, are you sure there is not a much better way than using NumPy
   arrays, be careful!

And while that distinction is clear for PyTorch + visualization, I am
not quite sure yet, that it is clear for various combinations of
`force=True` and array-like users.
Maybe CuPy does not want h5py to use `force=True`, because cupy has its
own super fast "stream-to-file" functionality... But it wants to to do
it for napari.

- Sebastian
...
Avoiding .gpu() is more straightforwardly about avoiding implicit
expensive computation.
...
while others do not choose to teach about it. There seems very
little
or even no "promise" attached to either `force=True` or
`force=False`.
NumPy can set a precedent through policy. The *only* reason client
libraries would implement `__array__` is to play well with NumPy, so
if NumPy documents that `force=True` should *always* succeed, we can
expect client libraries to follow suit. At least the PyTorch devs
have indicated that they would be open to this.
...
E.g. Napari wants to use it, but do the array-providers want Napari
to use it?
As Ralf pointed out, the PyTorch devs have already agreed to it.
From the napari perspective, we'd be ok with leaving the decision on
warnings to client libraries. We may or may not suppress them
depending on user requests. ;) But the point is to have a way of
saying "give me a NumPy array DAMMIT" without having to know about
all the possible array libraries. Which are numerous and getting
numerouser.
Ralf, you said you don't want warnings — even for sparse arrays? That
was an area of concern for you on the PyTorch discussion side.
...
And if the conversion still gives warnings for some array-objects,
have we actually gained much?
Yes.
Hameer,
...
I would advocate for a `force=` kwarg but personally I don't think
it's explicit enough, but probably as explicit as can be given
NumPy's API.
Yeah, I agree that force is kind of vague, which is why I was looking
for things like `allow_copy`. But it is hard to be general enough
here: sparse requires an expensive instantiation, cupy requires
copying from gpu to cpu, dask requires arbitrary computation, xarray
requires information loss... I'm inclined to agree with Ralf that
force= is the only generic-enough term, but I'm happy to entertain
other options!
Juan.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

Sebastian Berg

Re: [Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `array` interface