[Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

Sebastian Berg sebastian at sipsolutions.net
Tue Apr 28 12:49:35 EDT 2020


On Tue, 2020-04-28 at 09:58 -0500, Sebastian Berg wrote:
> On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote:
> <snip>
> > > So arguably, there is no type-safety concern due to `.detach()`.
> > 
> > I'm not sure what the question is here; no one mentioned type-
> > safety. 
> > The
> > PyTorch maintainers have already said they're fine with adding a
> > force
> > keyword.
> 
> But type-safety is the reason to distinguish between:
> 
> * np.asarrau(tensor)
> * np.asarray(tensor, force=True)
> 
> Similar to:
> 
> * operator.index(obj)
> * int(obj)   # convert less type-safe (strings, floats)!
> 
> I actually mentioned 3 reasons in my email:
> 
> 1. Teach and Inform users (about the next two mainly)
> 2. Type-safety
> 3. Expensive conversion 
> 
> And only type-safety is related to `.detach()` mentioning that there
> may not be clear story about the usage in that case.
> 

(Sorry something got broken here)

The question is what PyTorch's reasons are to feel `np.asarray(tensor)`
should not work generally.
I for one thought it was type-safety with regard to `.detach()`. And
then I was surprised to realize that type-safety might not be a great
reason to reject an implicit `.detach()` within `np.asarray(tensor)`.


In any case, all the long talk is simply that I first want to be clear
on what the concerns are why libraries reject `np.asarray(tensor)`.
And then, I want to be clear that adding `force=True` will actually
solves those concerns.
And I was surprised myself that this became very much unclear to me.

Again, one reason for it being not clear to me is half the ecosystem
could potentially can just always use `force=True`.  So there must be
some "good usage" and some "bad usage" and I would like to know what
that is.

- Sebastian


> (continued below)
> 
> <snip>
> > > 
> > > I am very much in favor of adding such things, but I still lack a
> > > bit
> > > of clarity as to whom we would be helping?
> > > 
> > 
> > See Juan's first email. I personally am ambivalent on this
> > proposal,
> > but if
> > Juan and the Napari devs really want it, that's good enough for me.
> 
> Of course I read it, twice, but it is only good enough for me if we
> actually *solve the issue*, and for that I want to know which issue
> we
> are solving :), it seems obvious, but I am not so sure...
> 
> That brings us to the other two reasons:
> 
> Teaching and Informing users:
> 
> If Napari uses `force=True` indiscriminately, it is not very clear to
> the user about whether or not the operation is expensive.  I.e. the
> user can learn it is when using `np.asarray(sparse_arr)` with other
> libraries. But they are not notified that
> `napari.vis_func(sparse_arr)`
> might kill their computer.
> 
> So the "Teaching" part can still partially work, but it does not
> inform
> the user well anymore on whether or not a function will blow-up
> memory.
> 
> Expensive Conversion:
> 
> If the main reason is expensive conversions, however, than, as a
> library I would probably just use it for half my API, since copying
> from GPU to CPU will still be much faster than my own function.
> 
> 
> Generally:
> 
> I want to help Napari, but it seems like there may be more to this,
> and
> it may be good to finish these thoughts before making a call.
> 
> E.g. Napari wants to use it, but do the array-providers want Napari
> to
> use it?
> 
> For sparse Hameer just mentioned that he still would want big
> warnings
> both during the operation and in the `np.asarray` documentation.
> If we put such big warnings there, we should have an idea of who we
> want to ignore that warning? (Napari yes, sklearn sometimes, ...?)
> 
>    -> Is "whatever the library feels right" good enough?
> 
> And if the conversion still gives warnings for some array-objects,
> have
> we actually gained much?
> 
>   -> Maybe we do, end-users may be happy to ignore those warnings...
> 
> 
> The one clear use-case for `force=True` is the end-user. Just like no
> library uses `int(obj)`, but end-users can use it very nicely.
> I am happy to help the end-user in this case, but if that is the
> target
> audience we may want to _discourage_ Napari from using `force=True`
> and
> encourage sparse not to put any RuntimeWarnings on it!
> 
> - Sebastian
> 
> 
> > Cheers,
> > Ralf
> > 
> > 
> > 
> > > If end-users will actually use `np.asarray(..., force=True)` over
> > > special methods, then great! But I am currently not sure the
> > > type-
> > > safety argument is all that big of a point.  And the performance
> > > or
> > > memory-blowup argument remains true even for visualization
> > > libraries
> > > (where the array is purely input and never output as such).
> > > 
> > > 
> > > But yes, "never copy" is a somewhat different extension to
> > > `__array__`
> > > and `np.asarray`. It guarantees high speed and in-place behaviour
> > > which
> > > is useful for different settings.
> > > 
> > > - Sebastian
> > > 
> > > 
> > > > > Cheers,
> > > > > Ralf
> > > > > 
> > > > > 
> > > > > > I think the discussion stalled on the precise spelling of
> > > > > > the
> > > > > > third
> > > > > > option.
> > > > > > 
> > > > > > `__array__` was not discussed there, but it seems like
> > > > > > adding
> > > > > > the
> > > > > > `copy`
> > > > > > argument to `__array__` would be a perfectly reasonable
> > > > > > extension.
> > > > > > 
> > > > > > Eric
> > > > > > 
> > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > > > jni at fastmail.com>
> > > > > > wrote:
> > > > > > 
> > > > > > > Hi everyone,
> > > > > > > 
> > > > > > > One bit of expressivity we would miss is “copy if
> > > > > > > necessary,
> > > > > > > but
> > > > > > > > otherwise don’t bother”, but there are workarounds to
> > > > > > > > this.
> > > > > > > > 
> > > > > > > 
> > > > > > > After a side discussion with Stéfan van der Walt, we came
> > > > > > > up
> > > > > > > with
> > > > > > > `allow_copy=True`, which would express to the downstream
> > > > > > > library that we
> > > > > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > > > > 
> > > > > > > This sounds like the sort of thing that is use case
> > > > > > > driven.
> > > > > > > If
> > > > > > > enough
> > > > > > > projects want to use it, then I have no objections to
> > > > > > > adding
> > > > > > > the keyword.
> > > > > > > OTOH, we need to be careful about adding too many
> > > > > > > interoperability tricks
> > > > > > > as they complicate the code and makes it hard for folks
> > > > > > > to
> > > > > > > determine the
> > > > > > > best solution. Interoperability is a hot topic and we
> > > > > > > need
> > > > > > > to
> > > > > > > be careful
> > > > > > > not put too leave behind too many experiments in the
> > > > > > > NumPy
> > > > > > > code.  Do you
> > > > > > > have any other ideas of how to achieve the same effect?
> > > > > > > 
> > > > > > > 
> > > > > > > Personally, I don’t have any other ideas, but would be
> > > > > > > happy to
> > > > > > > hear
> > > > > > > some!
> > > > > > > 
> > > > > > > My view regarding API/experiment creep is that
> > > > > > > `__array__`
> > > > > > > is
> > > > > > > the oldest
> > > > > > > and most basic of all the interop tricks and that this
> > > > > > > can
> > > > > > > be
> > > > > > > safely
> > > > > > > maintained for future generations. Currently it only
> > > > > > > takes
> > > > > > > `dtype=` as a
> > > > > > > keyword argument, so it is a very lean API. I think this
> > > > > > > particular use
> > > > > > > case is very natural and I’ve encountered the reluctance
> > > > > > > to
> > > > > > > implicitly copy
> > > > > > > twice, so I expect it is reasonably common.
> > > > > > > 
> > > > > > > Regarding difficulty in determining the best solution, I
> > > > > > > would
> > > > > > > be happy
> > > > > > > to contribute to the dispatch basics guide together with
> > > > > > > the
> > > > > > > new kwarg. I
> > > > > > > agree that the protocols are getting quite numerous and I
> > > > > > > couldn’t find a
> > > > > > > single place that gathers all the best practices
> > > > > > > together.
> > > > > > > But,
> > > > > > > to
> > > > > > > reiterate my point: `__array__` is the simplest of these
> > > > > > > and I
> > > > > > > think this
> > > > > > > keyword is pretty safe to add.
> > > > > > > 
> > > > > > > For ease of discussion, here are the API options
> > > > > > > discussed
> > > > > > > so
> > > > > > > far, as
> > > > > > > well as a few extra that I don’t like but might trigger
> > > > > > > other
> > > > > > > ideas:
> > > > > > > 
> > > > > > > np.asarray(my_duck_array, allow_copy=True)  # default is
> > > > > > > False,
> > > > > > > or None
> > > > > > > -> leave it to the duck array to decide
> > > > > > > np.asarray(my_duck_array, copy=True)  # always copies,
> > > > > > > but,
> > > > > > > if
> > > > > > > supported
> > > > > > > by the duck array, defers to it for the copy
> > > > > > > np.asarray(my_duck_array, copy=‘allow’)  # could take
> > > > > > > values
> > > > > > > ‘allow’,
> > > > > > > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > > > > > > np.asarray(my_duck_array, force_copy=False,
> > > > > > > allow_copy=True)  #
> > > > > > > separate
> > > > > > > concepts, but unclear what force_copy=True,
> > > > > > > allow_copy=False
> > > > > > > means!
> > > > > > > np.asarray(my_duck_array, force=True)
> > > > > > > 
> > > > > > > Juan.
> > > > > > > _______________________________________________
> > > > > > > NumPy-Discussion mailing list
> > > > > > > NumPy-Discussion at python.org
> > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > > > 
> > > > > > _______________________________________________
> > > > > > NumPy-Discussion mailing list
> > > > > > NumPy-Discussion at python.org
> > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > > 
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > 
> > > > 
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list