[Numpy-discussion] Proposal: add `force=` or `copy=` kwarg to `__array__` interface

Ralf Gommers ralf.gommers at gmail.com
Tue Apr 28 05:51:13 EDT 2020


On Mon, Apr 27, 2020 at 12:10 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote:
> > On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers <ralf.gommers at gmail.com
> > >
> > wrote:
> >
> > >
> > > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser <
> > > wieser.eric+numpy at gmail.com>
> > > wrote:
> > >
> > > > Perhaps worth mentioning that we've discussed this sort of API
> > > > before, in
> > > > https://github.com/numpy/numpy/pull/11897.
> > > >
> > > > Under that proposal, the api would be something like:
> > > >
> > > > * `copy=True` - always copy, like it is today
> > > > * `copy=False` - copy if needed, like it is today
> > > > * `copy=np.never_copy` - never copy, throw an exception if not
> > > > possible
> > > >
> > >
> > > There's a couple of issues I see with using `copy` for __array__:
> > > - copy is already weird (False doesn't mean no), and a [bool,
> > > some_obj_or_str] keyword isn't making that better
> > > - the behavior we're talking about can do more than copying, e.g.
> > > for
> > > PyTorch it would modify the autograd graph by adding detach(), and
> > > for
> > > sparse it's not just "make a copy" (which implies doubling memory
> > > use) but
> > > it densifies which can massively blow up the memory.
> > > - I'm -1 on adding things to the main namespace (never_copy) for
> > > something
> > > that can be handled differently (like a string, or a new keyword)
> > >
> > > tl;dr a new `force` keyword would be better
> > >
> >
> > I agree, “copy” is not a good description of this desired coercion
> > behavior.
> >
> > A new keyword argument like “force” would be much clearer.
> >
>
> That seems fine and practical. But, in the end it seems to me that the
> `force=` keyword, just means that some projects want to teach their
> users that:
>
> 1. `np.asarray()` can be expensive (and may always copy)
> 2. `np.asarray()` always loses type properties
>
> while others do not choose to teach about it.  There seems very little
> or even no "promise" attached to either `force=True` or `force=False`.
>
>
> In the end, the question is whether sparse will actually want to
> implement `force=True` if the main reason we add is for library use.
>

That's for PyData Sparse and scipy.sparse devs to answer. Maybe Hameer can
answer for the former here. For SciPy that should be decided on the
scipy-dev list, but my opinion would be: yes to adding __array__ that
raises TypeError by default, and converts with `force=True`.


> There is no difference between a visualization library or numpy. In
> both cases the users memory will blow up just the same.
>
> As for PyTorch, is `.detach()` even a good reason?  Maybe I am missing
> things, but:
>
> >>> torch.ones(10, requires_grad=True) + np.arange(10)
> # RuntimeError: Can't call numpy() on Variable that requires grad. Use
> var.detach().numpy() instead.
>
> So arguably, there is no type-safety concern due to `.detach()`.


I'm not sure what the question is here; no one mentioned type-safety. The
PyTorch maintainers have already said they're fine with adding a force
keyword.

There
> is an (obvious) general loss of type information that always occurs
> with an `np.asarray` call.
> But I do not see that creating any openings for bugs here, due to the
> wisdom of not allowing the above operation.
> In fact, it actually seems much worse for for xarray, or pandas. They
> do support the above operation and will potentially mess up if the
> arange was previously an xarray with a matching index, but different
> order.
>
>
> I am very much in favor of adding such things, but I still lack a bit
> of clarity as to whom we would be helping?
>

See Juan's first email. I personally am ambivalent on this proposal, but if
Juan and the Napari devs really want it, that's good enough for me.

Cheers,
Ralf



> If end-users will actually use `np.asarray(..., force=True)` over
> special methods, then great! But I am currently not sure the type-
> safety argument is all that big of a point.  And the performance or
> memory-blowup argument remains true even for visualization libraries
> (where the array is purely input and never output as such).
>
>
> But yes, "never copy" is a somewhat different extension to `__array__`
> and `np.asarray`. It guarantees high speed and in-place behaviour which
> is useful for different settings.
>
> - Sebastian
>
>
> >
> > > Cheers,
> > > Ralf
> > >
> > >
> > > > I think the discussion stalled on the precise spelling of the
> > > > third
> > > > option.
> > > >
> > > > `__array__` was not discussed there, but it seems like adding the
> > > > `copy`
> > > > argument to `__array__` would be a perfectly reasonable
> > > > extension.
> > > >
> > > > Eric
> > > >
> > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > jni at fastmail.com>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > One bit of expressivity we would miss is “copy if necessary,
> > > > > but
> > > > > > otherwise don’t bother”, but there are workarounds to this.
> > > > > >
> > > > >
> > > > > After a side discussion with Stéfan van der Walt, we came up
> > > > > with
> > > > > `allow_copy=True`, which would express to the downstream
> > > > > library that we
> > > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > >
> > > > > This sounds like the sort of thing that is use case driven. If
> > > > > enough
> > > > > projects want to use it, then I have no objections to adding
> > > > > the keyword.
> > > > > OTOH, we need to be careful about adding too many
> > > > > interoperability tricks
> > > > > as they complicate the code and makes it hard for folks to
> > > > > determine the
> > > > > best solution. Interoperability is a hot topic and we need to
> > > > > be careful
> > > > > not put too leave behind too many experiments in the NumPy
> > > > > code.  Do you
> > > > > have any other ideas of how to achieve the same effect?
> > > > >
> > > > >
> > > > > Personally, I don’t have any other ideas, but would be happy to
> > > > > hear
> > > > > some!
> > > > >
> > > > > My view regarding API/experiment creep is that `__array__` is
> > > > > the oldest
> > > > > and most basic of all the interop tricks and that this can be
> > > > > safely
> > > > > maintained for future generations. Currently it only takes
> > > > > `dtype=` as a
> > > > > keyword argument, so it is a very lean API. I think this
> > > > > particular use
> > > > > case is very natural and I’ve encountered the reluctance to
> > > > > implicitly copy
> > > > > twice, so I expect it is reasonably common.
> > > > >
> > > > > Regarding difficulty in determining the best solution, I would
> > > > > be happy
> > > > > to contribute to the dispatch basics guide together with the
> > > > > new kwarg. I
> > > > > agree that the protocols are getting quite numerous and I
> > > > > couldn’t find a
> > > > > single place that gathers all the best practices together. But,
> > > > > to
> > > > > reiterate my point: `__array__` is the simplest of these and I
> > > > > think this
> > > > > keyword is pretty safe to add.
> > > > >
> > > > > For ease of discussion, here are the API options discussed so
> > > > > far, as
> > > > > well as a few extra that I don’t like but might trigger other
> > > > > ideas:
> > > > >
> > > > > np.asarray(my_duck_array, allow_copy=True)  # default is False,
> > > > > or None
> > > > > -> leave it to the duck array to decide
> > > > > np.asarray(my_duck_array, copy=True)  # always copies, but, if
> > > > > supported
> > > > > by the duck array, defers to it for the copy
> > > > > np.asarray(my_duck_array, copy=‘allow’)  # could take values
> > > > > ‘allow’,
> > > > > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > > > > np.asarray(my_duck_array, force_copy=False, allow_copy=True)  #
> > > > > separate
> > > > > concepts, but unclear what force_copy=True, allow_copy=False
> > > > > means!
> > > > > np.asarray(my_duck_array, force=True)
> > > > >
> > > > > Juan.
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > >
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200428/2bd5c753/attachment.html>


More information about the NumPy-Discussion mailing list