On Tue, Apr 28, 2020 at 5:03 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Tue, 2020-04-28 at 11:51 +0200, Ralf Gommers wrote:
<snip>
> > So arguably, there is no type-safety concern due to `.detach()`.
>
> I'm not sure what the question is here; no one mentioned type-safety.
> The
> PyTorch maintainers have already said they're fine with adding a
> force
> keyword.

But type-safety is the reason to distinguish between:

* np.asarrau(tensor)
* np.asarray(tensor, force=True)

No it's not, the rationale given by library authors is expensive conversion / memory copies / side effects. `np.asarray(x)` is used all over the place, and can/will continue to be used by library authors. `force=True` is for cases where things like expensive conversion don't matter, like visualization - if you need a picture of an array then it helps, while the downside of writing inefficient/unreliable numerical code isn't present.


Similar to:

* operator.index(obj)
* int(obj)   # convert less type-safe (strings, floats)!

I actually mentioned 3 reasons in my email:

1. Teach and Inform users (about the next two mainly)
2. Type-safety
3. Expensive conversion

And only type-safety is related to `.detach()` mentioning that there
may not be clear story about the usage in that case.

(continued below)

>
<snip>
> >
> >
> > I am very much in favor of adding such things, but I still lack a
> > bit
> > of clarity as to whom we would be helping?
> >
>
> See Juan's first email. I personally am ambivalent on this proposal,
> but if
> Juan and the Napari devs really want it, that's good enough for me.

Of course I read it, twice, but it is only good enough for me if we
actually *solve the issue*, and for that I want to know which issue we
are solving :), it seems obvious, but I am not so sure...

That brings us to the other two reasons:

Teaching and Informing users:

If Napari uses `force=True` indiscriminately, it is not very clear to
the user about whether or not the operation is expensive.  I.e. the
user can learn it is when using `np.asarray(sparse_arr)` with other
libraries. But they are not notified that `napari.vis_func(sparse_arr)`
might kill their computer.

So the "Teaching" part can still partially work, but it does not inform
the user well anymore on whether or not a function will blow-up memory.

Expensive Conversion:

If the main reason is expensive conversions, however, than, as a
library I would probably just use it for half my API, since copying
from GPU to CPU will still be much faster than my own function.


Generally:

I want to help Napari, but it seems like there may be more to this, and
it may be good to finish these thoughts before making a call.

E.g. Napari wants to use it, but do the array-providers want Napari to
use it?

For sparse Hameer just mentioned that he still would want big warnings
both during the operation and in the `np.asarray` documentation.
If we put such big warnings there, we should have an idea of who we
want to ignore that warning? (Napari yes, sklearn sometimes, ...?)

There clearly should not be warnings. And sklearn is irrelevant, it cannot use `force=True`.

Ralf



   -> Is "whatever the library feels right" good enough?

And if the conversion still gives warnings for some array-objects, have
we actually gained much?

  -> Maybe we do, end-users may be happy to ignore those warnings...


The one clear use-case for `force=True` is the end-user. Just like no
library uses `int(obj)`, but end-users can use it very nicely.
I am happy to help the end-user in this case, but if that is the target
audience we may want to _discourage_ Napari from using `force=True` and
encourage sparse not to put any RuntimeWarnings on it!

- Sebastian


> Cheers,
> Ralf
>
>
>
> > If end-users will actually use `np.asarray(..., force=True)` over
> > special methods, then great! But I am currently not sure the type-
> > safety argument is all that big of a point.  And the performance or
> > memory-blowup argument remains true even for visualization
> > libraries
> > (where the array is purely input and never output as such).
> >
> >
> > But yes, "never copy" is a somewhat different extension to
> > `__array__`
> > and `np.asarray`. It guarantees high speed and in-place behaviour
> > which
> > is useful for different settings.
> >
> > - Sebastian
> >
> >
> > > > Cheers,
> > > > Ralf
> > > >
> > > >
> > > > > I think the discussion stalled on the precise spelling of the
> > > > > third
> > > > > option.
> > > > >
> > > > > `__array__` was not discussed there, but it seems like adding
> > > > > the
> > > > > `copy`
> > > > > argument to `__array__` would be a perfectly reasonable
> > > > > extension.
> > > > >
> > > > > Eric
> > > > >
> > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias <
> > > > > jni@fastmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > One bit of expressivity we would miss is “copy if
> > > > > > necessary,
> > > > > > but
> > > > > > > otherwise don’t bother”, but there are workarounds to
> > > > > > > this.
> > > > > > >
> > > > > >
> > > > > > After a side discussion with Stéfan van der Walt, we came
> > > > > > up
> > > > > > with
> > > > > > `allow_copy=True`, which would express to the downstream
> > > > > > library that we
> > > > > > don’t mind waiting, but that zero-copy would also be ok.
> > > > > >
> > > > > > This sounds like the sort of thing that is use case driven.
> > > > > > If
> > > > > > enough
> > > > > > projects want to use it, then I have no objections to
> > > > > > adding
> > > > > > the keyword.
> > > > > > OTOH, we need to be careful about adding too many
> > > > > > interoperability tricks
> > > > > > as they complicate the code and makes it hard for folks to
> > > > > > determine the
> > > > > > best solution. Interoperability is a hot topic and we need
> > > > > > to
> > > > > > be careful
> > > > > > not put too leave behind too many experiments in the NumPy
> > > > > > code.  Do you
> > > > > > have any other ideas of how to achieve the same effect?
> > > > > >
> > > > > >
> > > > > > Personally, I don’t have any other ideas, but would be
> > > > > > happy to
> > > > > > hear
> > > > > > some!
> > > > > >
> > > > > > My view regarding API/experiment creep is that `__array__`
> > > > > > is
> > > > > > the oldest
> > > > > > and most basic of all the interop tricks and that this can
> > > > > > be
> > > > > > safely
> > > > > > maintained for future generations. Currently it only takes
> > > > > > `dtype=` as a
> > > > > > keyword argument, so it is a very lean API. I think this
> > > > > > particular use
> > > > > > case is very natural and I’ve encountered the reluctance to
> > > > > > implicitly copy
> > > > > > twice, so I expect it is reasonably common.
> > > > > >
> > > > > > Regarding difficulty in determining the best solution, I
> > > > > > would
> > > > > > be happy
> > > > > > to contribute to the dispatch basics guide together with
> > > > > > the
> > > > > > new kwarg. I
> > > > > > agree that the protocols are getting quite numerous and I
> > > > > > couldn’t find a
> > > > > > single place that gathers all the best practices together.
> > > > > > But,
> > > > > > to
> > > > > > reiterate my point: `__array__` is the simplest of these
> > > > > > and I
> > > > > > think this
> > > > > > keyword is pretty safe to add.
> > > > > >
> > > > > > For ease of discussion, here are the API options discussed
> > > > > > so
> > > > > > far, as
> > > > > > well as a few extra that I don’t like but might trigger
> > > > > > other
> > > > > > ideas:
> > > > > >
> > > > > > np.asarray(my_duck_array, allow_copy=True)  # default is
> > > > > > False,
> > > > > > or None
> > > > > > -> leave it to the duck array to decide
> > > > > > np.asarray(my_duck_array, copy=True)  # always copies, but,
> > > > > > if
> > > > > > supported
> > > > > > by the duck array, defers to it for the copy
> > > > > > np.asarray(my_duck_array, copy=‘allow’)  # could take
> > > > > > values
> > > > > > ‘allow’,
> > > > > > ‘force’, ’no’, True(=‘force’), False(=’no’)
> > > > > > np.asarray(my_duck_array, force_copy=False,
> > > > > > allow_copy=True)  #
> > > > > > separate
> > > > > > concepts, but unclear what force_copy=True,
> > > > > > allow_copy=False
> > > > > > means!
> > > > > > np.asarray(my_duck_array, force=True)
> > > > > >
> > > > > > Juan.
> > > > > > _______________________________________________
> > > > > > NumPy-Discussion mailing list
> > > > > > NumPy-Discussion@python.org
> > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > >
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion@python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > >
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion@python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion