[Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

Ralf Gommers ralf.gommers at gmail.com
Wed Apr 29 05:11:13 EDT 2020


On Tue, Apr 28, 2020 at 5:03 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote:
> <snip>
> > > So arguably, there is no type-safety concern due to `.detach()`.
> >
> > I'm not sure what the question is here; no one mentioned type-safety.
> > The
> > PyTorch maintainers have already said they're fine with adding a
> > force
> > keyword.
>
> But type-safety is the reason to distinguish between:
>
> * np.asarrau(tensor)
> * np.asarray(tensor, force=True)
>

No it's not, the rationale given by library authors is expensive conversion
/ memory copies / side effects. `np.asarray(x)` is used all over the place,
and can/will continue to be used by library authors. `force=True` is for
cases where things like expensive conversion don't matter, like
visualization - if you need a picture of an array then it helps, while the
downside of writing inefficient/unreliable numerical code isn't present.


> Similar to:
>
> * operator.index(obj)
> * int(obj)   # convert less type-safe (strings, floats)!
>
> I actually mentioned 3 reasons in my email:
>
> 1. Teach and Inform users (about the next two mainly)
> 2. Type-safety
> 3. Expensive conversion
>
> And only type-safety is related to `.detach()` mentioning that there
> may not be clear story about the usage in that case.
>
> (continued below)
>
> >
> <snip>
> > >
> > >
> > > I am very much in favor of adding such things, but I still lack a
> > > bit
> > > of clarity as to whom we would be helping?
> > >
> >
> > See Juan's first email. I personally am ambivalent on this proposal,
> > but if
> > Juan and the Napari devs really want it, that's good enough for me.
>
> Of course I read it, twice, but it is only good enough for me if we
> actually *solve the issue*, and for that I want to know which issue we
> are solving :), it seems obvious, but I am not so sure...
>
> That brings us to the other two reasons:
>
> Teaching and Informing users:
>
> If Napari uses `force=True` indiscriminately, it is not very clear to
> the user about whether or not the operation is expensive.  I.e. the
> user can learn it is when using `np.asarray(sparse_arr)` with other
> libraries. But they are not notified that `napari.vis_func(sparse_arr)`
> might kill their computer.
>
> So the "Teaching" part can still partially work, but it does not inform
> the user well anymore on whether or not a function will blow-up memory.
>
> Expensive Conversion:
>
> If the main reason is expensive conversions, however, than, as a
> library I would probably just use it for half my API, since copying
> from GPU to CPU will still be much faster than my own function.
>
>
> Generally:
>
> I want to help Napari, but it seems like there may be more to this, and
> it may be good to finish these thoughts before making a call.
>
> E.g. Napari wants to use it, but do the array-providers want Napari to
> use it?
>
> For sparse Hameer just mentioned that he still would want big warnings
> both during the operation and in the `np.asarray` documentation.
> If we put such big warnings there, we should have an idea of who we
> want to ignore that warning? (Napari yes, sklearn sometimes, ...?)
>

There clearly should not be warnings. And sklearn is irrelevant, it cannot
use `force=True`.

Ralf



>    -> Is "whatever the library feels right" good enough?
>
> And if the conversion still gives warnings for some array-objects, have
> we actually gained much?
>
>   -> Maybe we do, end-users may be happy to ignore those warnings...
>
>
> The one clear use-case for `force=True` is the end-user. Just like no
> library uses `int(obj)`, but end-users can use it very nicely.
> I am happy to help the end-user in this case, but if that is the target
> audience we may want to _discourage_ Napari from using `force=True` and
> encourage sparse not to put any RuntimeWarnings on it!
>
> - Sebastian
>
>
> > Cheers,
> > Ralf
> >
> >
> >
> > > If end-users will actually use `np.asarray(..., force=True)` over
> > > special methods, then great! But I am currently not sure the type-
> > > safety argument is all that big of a point.  And the performance or
> > > memory-blowup argument remains true even for visualization
> > > libraries
> > > (where the array is purely input and never output as such).
> > >
> > >
> > > But yes, "never copy" is a somewhat different extension to
> > > `__array__`
> > > and `np.asarray`. It guarantees high speed and in-place behaviour
> > > which
> > > is useful for different settings.
> > >
> > > - Sebastian
> > >
> > >
> > > > > Cheers,
> > > > > Ralf
> > > > >
> > > > >
> > > > > > I think the discussion stalled on the precise spelling of the
> > > > > > third
> > > > > > option.
> > > > > >
> > > > > > `__array__` was not discussed there, but it seems like adding
> > > > > > the
> > > > > > `copy`
> > > > > > argument to `__array__` would be a perfectly reasonable
> > > > > > extension.
> > > > > >
> > > > > > Eric
> > > > > >
> > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > > > jni at fastmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > One bit of expressivity we would miss is “copy if
> > > > > > > necessary,
> > > > > > > but
> > > > > > > > otherwise don’t bother”, but there are workarounds to
> > > > > > > > this.
> > > > > > > >
> > > > > > >
> > > > > > > After a side discussion with Stéfan van der Walt, we came
> > > > > > > up
> > > > > > > with
> > > > > > > `allow_copy=True`, which would express to the downstream
> > > > > > > library that we
> > > > > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > > > >
> > > > > > > This sounds like the sort of thing that is use case driven.
> > > > > > > If
> > > > > > > enough
> > > > > > > projects want to use it, then I have no objections to
> > > > > > > adding
> > > > > > > the keyword.
> > > > > > > OTOH, we need to be careful about adding too many
> > > > > > > interoperability tricks
> > > > > > > as they complicate the code and makes it hard for folks to
> > > > > > > determine the
> > > > > > > best solution. Interoperability is a hot topic and we need
> > > > > > > to
> > > > > > > be careful
> > > > > > > not put too leave behind too many experiments in the NumPy
> > > > > > > code.  Do you
> > > > > > > have any other ideas of how to achieve the same effect?
> > > > > > >
> > > > > > >
> > > > > > > Personally, I don’t have any other ideas, but would be
> > > > > > > happy to
> > > > > > > hear
> > > > > > > some!
> > > > > > >
> > > > > > > My view regarding API/experiment creep is that `__array__`
> > > > > > > is
> > > > > > > the oldest
> > > > > > > and most basic of all the interop tricks and that this can
> > > > > > > be
> > > > > > > safely
> > > > > > > maintained for future generations. Currently it only takes
> > > > > > > `dtype=` as a
> > > > > > > keyword argument, so it is a very lean API. I think this
> > > > > > > particular use
> > > > > > > case is very natural and I’ve encountered the reluctance to
> > > > > > > implicitly copy
> > > > > > > twice, so I expect it is reasonably common.
> > > > > > >
> > > > > > > Regarding difficulty in determining the best solution, I
> > > > > > > would
> > > > > > > be happy
> > > > > > > to contribute to the dispatch basics guide together with
> > > > > > > the
> > > > > > > new kwarg. I
> > > > > > > agree that the protocols are getting quite numerous and I
> > > > > > > couldn’t find a
> > > > > > > single place that gathers all the best practices together.
> > > > > > > But,
> > > > > > > to
> > > > > > > reiterate my point: `__array__` is the simplest of these
> > > > > > > and I
> > > > > > > think this
> > > > > > > keyword is pretty safe to add.
> > > > > > >
> > > > > > > For ease of discussion, here are the API options discussed
> > > > > > > so
> > > > > > > far, as
> > > > > > > well as a few extra that I don’t like but might trigger
> > > > > > > other
> > > > > > > ideas:
> > > > > > >
> > > > > > > np.asarray(my_duck_array, allow_copy=True)  # default is
> > > > > > > False,
> > > > > > > or None
> > > > > > > -> leave it to the duck array to decide
> > > > > > > np.asarray(my_duck_array, copy=True)  # always copies, but,
> > > > > > > if
> > > > > > > supported
> > > > > > > by the duck array, defers to it for the copy
> > > > > > > np.asarray(my_duck_array, copy=‘allow’)  # could take
> > > > > > > values
> > > > > > > ‘allow’,
> > > > > > > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > > > > > > np.asarray(my_duck_array, force_copy=False,
> > > > > > > allow_copy=True)  #
> > > > > > > separate
> > > > > > > concepts, but unclear what force_copy=True,
> > > > > > > allow_copy=False
> > > > > > > means!
> > > > > > > np.asarray(my_duck_array, force=True)
> > > > > > >
> > > > > > > Juan.
> > > > > > > _______________________________________________
> > > > > > > NumPy-Discussion mailing list
> > > > > > > NumPy-Discussion at python.org
> > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > > >
> > > > > > _______________________________________________
> > > > > > NumPy-Discussion mailing list
> > > > > > NumPy-Discussion at python.org
> > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > >
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > >
> > > >
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200429/87538cb4/attachment-0001.html>


More information about the NumPy-Discussion mailing list