[Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

Sebastian Berg sebastian at sipsolutions.net
Tue Apr 28 10:58:01 EDT 2020


On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote:
<snip>
> > So arguably, there is no type-safety concern due to `.detach()`.
> 
> I'm not sure what the question is here; no one mentioned type-safety. 
> The
> PyTorch maintainers have already said they're fine with adding a
> force
> keyword.

But type-safety is the reason to distinguish between:

* np.asarrau(tensor)
* np.asarray(tensor, force=True)

Similar to:

* operator.index(obj)
* int(obj)   # convert less type-safe (strings, floats)!

I actually mentioned 3 reasons in my email:

1. Teach and Inform users (about the next two mainly)
2. Type-safety
3. Expensive conversion 

And only type-safety is related to `.detach()` mentioning that there
may not be clear story about the usage in that case.

(continued below)

> 
<snip>
> > 
> > 
> > I am very much in favor of adding such things, but I still lack a
> > bit
> > of clarity as to whom we would be helping?
> > 
> 
> See Juan's first email. I personally am ambivalent on this proposal,
> but if
> Juan and the Napari devs really want it, that's good enough for me.

Of course I read it, twice, but it is only good enough for me if we
actually *solve the issue*, and for that I want to know which issue we
are solving :), it seems obvious, but I am not so sure...

That brings us to the other two reasons:

Teaching and Informing users:

If Napari uses `force=True` indiscriminately, it is not very clear to
the user about whether or not the operation is expensive.  I.e. the
user can learn it is when using `np.asarray(sparse_arr)` with other
libraries. But they are not notified that `napari.vis_func(sparse_arr)`
might kill their computer.

So the "Teaching" part can still partially work, but it does not inform
the user well anymore on whether or not a function will blow-up memory.

Expensive Conversion:

If the main reason is expensive conversions, however, than, as a
library I would probably just use it for half my API, since copying
from GPU to CPU will still be much faster than my own function.


Generally:

I want to help Napari, but it seems like there may be more to this, and
it may be good to finish these thoughts before making a call.

E.g. Napari wants to use it, but do the array-providers want Napari to
use it?

For sparse Hameer just mentioned that he still would want big warnings
both during the operation and in the `np.asarray` documentation.
If we put such big warnings there, we should have an idea of who we
want to ignore that warning? (Napari yes, sklearn sometimes, ...?)

   -> Is "whatever the library feels right" good enough?

And if the conversion still gives warnings for some array-objects, have
we actually gained much?

  -> Maybe we do, end-users may be happy to ignore those warnings...


The one clear use-case for `force=True` is the end-user. Just like no
library uses `int(obj)`, but end-users can use it very nicely.
I am happy to help the end-user in this case, but if that is the target
audience we may want to _discourage_ Napari from using `force=True` and
encourage sparse not to put any RuntimeWarnings on it!

- Sebastian


> Cheers,
> Ralf
> 
> 
> 
> > If end-users will actually use `np.asarray(..., force=True)` over
> > special methods, then great! But I am currently not sure the type-
> > safety argument is all that big of a point.  And the performance or
> > memory-blowup argument remains true even for visualization
> > libraries
> > (where the array is purely input and never output as such).
> > 
> > 
> > But yes, "never copy" is a somewhat different extension to
> > `__array__`
> > and `np.asarray`. It guarantees high speed and in-place behaviour
> > which
> > is useful for different settings.
> > 
> > - Sebastian
> > 
> > 
> > > > Cheers,
> > > > Ralf
> > > > 
> > > > 
> > > > > I think the discussion stalled on the precise spelling of the
> > > > > third
> > > > > option.
> > > > > 
> > > > > `__array__` was not discussed there, but it seems like adding
> > > > > the
> > > > > `copy`
> > > > > argument to `__array__` would be a perfectly reasonable
> > > > > extension.
> > > > > 
> > > > > Eric
> > > > > 
> > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > > jni at fastmail.com>
> > > > > wrote:
> > > > > 
> > > > > > Hi everyone,
> > > > > > 
> > > > > > One bit of expressivity we would miss is “copy if
> > > > > > necessary,
> > > > > > but
> > > > > > > otherwise don’t bother”, but there are workarounds to
> > > > > > > this.
> > > > > > > 
> > > > > > 
> > > > > > After a side discussion with Stéfan van der Walt, we came
> > > > > > up
> > > > > > with
> > > > > > `allow_copy=True`, which would express to the downstream
> > > > > > library that we
> > > > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > > > 
> > > > > > This sounds like the sort of thing that is use case driven.
> > > > > > If
> > > > > > enough
> > > > > > projects want to use it, then I have no objections to
> > > > > > adding
> > > > > > the keyword.
> > > > > > OTOH, we need to be careful about adding too many
> > > > > > interoperability tricks
> > > > > > as they complicate the code and makes it hard for folks to
> > > > > > determine the
> > > > > > best solution. Interoperability is a hot topic and we need
> > > > > > to
> > > > > > be careful
> > > > > > not put too leave behind too many experiments in the NumPy
> > > > > > code.  Do you
> > > > > > have any other ideas of how to achieve the same effect?
> > > > > > 
> > > > > > 
> > > > > > Personally, I don’t have any other ideas, but would be
> > > > > > happy to
> > > > > > hear
> > > > > > some!
> > > > > > 
> > > > > > My view regarding API/experiment creep is that `__array__`
> > > > > > is
> > > > > > the oldest
> > > > > > and most basic of all the interop tricks and that this can
> > > > > > be
> > > > > > safely
> > > > > > maintained for future generations. Currently it only takes
> > > > > > `dtype=` as a
> > > > > > keyword argument, so it is a very lean API. I think this
> > > > > > particular use
> > > > > > case is very natural and I’ve encountered the reluctance to
> > > > > > implicitly copy
> > > > > > twice, so I expect it is reasonably common.
> > > > > > 
> > > > > > Regarding difficulty in determining the best solution, I
> > > > > > would
> > > > > > be happy
> > > > > > to contribute to the dispatch basics guide together with
> > > > > > the
> > > > > > new kwarg. I
> > > > > > agree that the protocols are getting quite numerous and I
> > > > > > couldn’t find a
> > > > > > single place that gathers all the best practices together.
> > > > > > But,
> > > > > > to
> > > > > > reiterate my point: `__array__` is the simplest of these
> > > > > > and I
> > > > > > think this
> > > > > > keyword is pretty safe to add.
> > > > > > 
> > > > > > For ease of discussion, here are the API options discussed
> > > > > > so
> > > > > > far, as
> > > > > > well as a few extra that I don’t like but might trigger
> > > > > > other
> > > > > > ideas:
> > > > > > 
> > > > > > np.asarray(my_duck_array, allow_copy=True)  # default is
> > > > > > False,
> > > > > > or None
> > > > > > -> leave it to the duck array to decide
> > > > > > np.asarray(my_duck_array, copy=True)  # always copies, but,
> > > > > > if
> > > > > > supported
> > > > > > by the duck array, defers to it for the copy
> > > > > > np.asarray(my_duck_array, copy=‘allow’)  # could take
> > > > > > values
> > > > > > ‘allow’,
> > > > > > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > > > > > np.asarray(my_duck_array, force_copy=False,
> > > > > > allow_copy=True)  #
> > > > > > separate
> > > > > > concepts, but unclear what force_copy=True,
> > > > > > allow_copy=False
> > > > > > means!
> > > > > > np.asarray(my_duck_array, force=True)
> > > > > > 
> > > > > > Juan.
> > > > > > _______________________________________________
> > > > > > NumPy-Discussion mailing list
> > > > > > NumPy-Discussion at python.org
> > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > > 
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > 
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list