[Numpy-discussion] Add guaranteed no-copy to array creation and reshape?
Eric Wieser
wieser.eric+numpy at gmail.com
Mon Jan 7 14:30:19 EST 2019
@Matthias:
Most of the time I would not assign to arr.shape, but in some rare
occasions I find it very useful.
And one of those rare occasions is when you want guaranteed no-copy
behavior.
Can you come up with any other example?
The only real argument you seem to have here is “my code uses arr.shape =
...“ and I don’t want it to break. That’s a fair argument, but all it
really means is we should start emitting DeprecationWarning("Use arr =
arr.reshape(..., copy=np.never_copy) instead of arr.shape = ..."), and
consider having a long deprecation.
If necessary we could compromise on just putting a warning in the docs, and
not notifying the user at all.
@Ralf
np.newaxis is not relevant here - it’s a simple alias for None, is just
there for code readability, and is much more widely applicable than
np.never_copy would be.
Is there any particular reason we chose to use None? If I were designing it
again, I’d consider a singleton object with a better __repr__
@Nathaniel
I guess another possibility to throw out there would be a second kwarg,
require_view=False/True.
The downside of this approach is that array-likes will definitely need
updating to support this new behavior, whereas many may work out of the box
if we extend the copy argument (like, say, maskedarray). This also ties
into the __bool__ override - that will ensure that subclasses which don’t
have a trivial reshape crash.
@Sebastian:
Unless we replace the string when dispatching, which seems strange on first
sight.
I’m envisaging cases where we don’t have a dispatcher at all:
- Duck arrays implementing methods matching ndarray
- Something like my_custom_function(arr, copy=...) that forwards its
copy argument to reshape
Eric
On Mon, 7 Jan 2019 at 11:05 Matthias Geier <matthias.geier at gmail.com> wrote:
> On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
> >
> > On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> > > Hi Sebastian.
> > >
> > > Thanks for the clarification.
> > >
> > <snip>
> > > > print(arr.shape) # also (5, 2)
> > > >
> > > > so the arr container (shape, dtype) is changed/muted. I think we
> > > > expect
> > > > that for content here, but not for the shape.
> > >
> > > Thanks for the clarification, I think I now understand your example.
> > >
> > > However, the behavior you are describing is just like the normal
> > > reference semantics of Python itself.
> > >
> > > If you have multiple identifiers bound to the same (mutable) object,
> > > you'll always have this "problem".
> > >
> > > I think every Python user should be aware of this behavior, but I
> > > don't think it is reason to discourage assigning to arr.shape.
> >
> > Well, I doubt I will convince you.
>
> I think we actually have quite little disagreement.
>
> I agree with you on what should be done *most of the time*, but I
> wouldn't totally discourage mutating NumPy array shapes, because I
> think in the right circumstances it can be very useful.
>
> > But want to point out that a numpy
> > array is:
> >
> > * underlying data
> > * shape/strides (pointing to the exact data)
> > * data type (interpret the data)
> >
> > Arrays are mutable, but this is only half true from my perspective.
> > Everyone using numpy should be aware of "views", i.e. that the content
> > of the underlying data can change.
>
> I agree, everyone should be aware of that.
>
> > However, if I have a read-only array, and pass it around, I would not
> > expect it to change. That is because while the underlying data is
> > muted, how this data is accessed and interpreted is not.
> >
> > In other words, I see array objects as having two sides to them [0]:
> >
> > * Underlying data -> normally mutable and often muted
> > * container: -> not muted by almost all code
> > * shape/strides
> > * data type
>
> Exactly: "almost all code".
>
> Most of the time I would not assign to arr.shape, but in some rare
> occasions I find it very useful.
>
> And one of those rare occasions is when you want guaranteed no-copy
> behavior.
>
> There are also some (most likely significantly rarer) cases where I
> would modify arr.strides.
>
> > I realize that in some cases muting the container metadata happens. But
> > I do believe it should be as minimal as possible. And frankly, probably
> > one could do away with it completely.
>
> I guess that's the only point where we disagree.
>
> I wouldn't completely discourage it and I would definitely not remove
> the functionality.
>
> > Another example for where it is bad would be a threaded environment. If
> > a python function temporarily changes the shape of an array to read
> > from it without creating a view first, this will break multi-threaded
> > access to that array.
>
> Sure, let's not use it while multi-threading then.
>
> I still think that's not at all a reason to remove the feature.
>
> There are some things that are problematic when multi-threading, but
> that's typically not reason enough to completely disallow them.
>
> cheers,
> Matthias
>
> >
> > - Sebastian
> >
> >
> > [0] I tried to find other examples for such a split. Maybe a
> > categorical/state object which is allowed change value/state. But the
> > list of possible states cannot change.
> >
> >
> > > Coming back to the original suggestion of this thread:
> > > Since assigning to arr.shape makes sure no copy of the array data is
> > > made, I don't think it's necessary to add a new no-copy argument to
> > > reshape().
> > >
> > > But the bug you mentioned ("on error the `arr.shape = ...` code
> > > currently creates the copy temporarily") should probably be fixed at
> > > some point ...
> > >
> > > cheers,
> > > Matthias
> > >
> > > > - Sebastian
> > > >
> > > >
> > > > > > There may be some corner cases, but a lot of the
> > > > > > "than why is it allowed" questions are answered with: for
> > > > > > history
> > > > > > reasons.
> > > > >
> > > > > OK, that's a good point.
> > > > >
> > > > > > By the way, on error the `arr.shape = ...` code currently
> > > > > > creates
> > > > > > the
> > > > > > copy temporarily.
> > > > >
> > > > > That's interesting and it should probably be fixed.
> > > > >
> > > > > But it is not reason enough for me not to use it.
> > > > > I find it important that is doesn't make a copy in the success
> > > > > case,
> > > > > I
> > > > > don't care very much for the error case.
> > > > >
> > > > > Would you mind elaborating on the real reasons why I shouldn't
> > > > > use
> > > > > it?
> > > > >
> > > > > cheers,
> > > > > Matthias
> > > > >
> > > > > > - Sebastian
> > > > > >
> > > > > >
> > > > > > > cheers,
> > > > > > > Matthias
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190107/cd8d7b56/attachment-0001.html>
More information about the NumPy-Discussion
mailing list