[Numpy-discussion] Add guaranteed no-copy to array creation and reshape?
Matthias Geier
matthias.geier at gmail.com
Mon Jan 7 14:04:58 EST 2019
On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
>
> On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> > Hi Sebastian.
> >
> > Thanks for the clarification.
> >
> <snip>
> > > print(arr.shape) # also (5, 2)
> > >
> > > so the arr container (shape, dtype) is changed/muted. I think we
> > > expect
> > > that for content here, but not for the shape.
> >
> > Thanks for the clarification, I think I now understand your example.
> >
> > However, the behavior you are describing is just like the normal
> > reference semantics of Python itself.
> >
> > If you have multiple identifiers bound to the same (mutable) object,
> > you'll always have this "problem".
> >
> > I think every Python user should be aware of this behavior, but I
> > don't think it is reason to discourage assigning to arr.shape.
>
> Well, I doubt I will convince you.
I think we actually have quite little disagreement.
I agree with you on what should be done *most of the time*, but I
wouldn't totally discourage mutating NumPy array shapes, because I
think in the right circumstances it can be very useful.
> But want to point out that a numpy
> array is:
>
> * underlying data
> * shape/strides (pointing to the exact data)
> * data type (interpret the data)
>
> Arrays are mutable, but this is only half true from my perspective.
> Everyone using numpy should be aware of "views", i.e. that the content
> of the underlying data can change.
I agree, everyone should be aware of that.
> However, if I have a read-only array, and pass it around, I would not
> expect it to change. That is because while the underlying data is
> muted, how this data is accessed and interpreted is not.
>
> In other words, I see array objects as having two sides to them [0]:
>
> * Underlying data -> normally mutable and often muted
> * container: -> not muted by almost all code
> * shape/strides
> * data type
Exactly: "almost all code".
Most of the time I would not assign to arr.shape, but in some rare
occasions I find it very useful.
And one of those rare occasions is when you want guaranteed no-copy behavior.
There are also some (most likely significantly rarer) cases where I
would modify arr.strides.
> I realize that in some cases muting the container metadata happens. But
> I do believe it should be as minimal as possible. And frankly, probably
> one could do away with it completely.
I guess that's the only point where we disagree.
I wouldn't completely discourage it and I would definitely not remove
the functionality.
> Another example for where it is bad would be a threaded environment. If
> a python function temporarily changes the shape of an array to read
> from it without creating a view first, this will break multi-threaded
> access to that array.
Sure, let's not use it while multi-threading then.
I still think that's not at all a reason to remove the feature.
There are some things that are problematic when multi-threading, but
that's typically not reason enough to completely disallow them.
cheers,
Matthias
>
> - Sebastian
>
>
> [0] I tried to find other examples for such a split. Maybe a
> categorical/state object which is allowed change value/state. But the
> list of possible states cannot change.
>
>
> > Coming back to the original suggestion of this thread:
> > Since assigning to arr.shape makes sure no copy of the array data is
> > made, I don't think it's necessary to add a new no-copy argument to
> > reshape().
> >
> > But the bug you mentioned ("on error the `arr.shape = ...` code
> > currently creates the copy temporarily") should probably be fixed at
> > some point ...
> >
> > cheers,
> > Matthias
> >
> > > - Sebastian
> > >
> > >
> > > > > There may be some corner cases, but a lot of the
> > > > > "than why is it allowed" questions are answered with: for
> > > > > history
> > > > > reasons.
> > > >
> > > > OK, that's a good point.
> > > >
> > > > > By the way, on error the `arr.shape = ...` code currently
> > > > > creates
> > > > > the
> > > > > copy temporarily.
> > > >
> > > > That's interesting and it should probably be fixed.
> > > >
> > > > But it is not reason enough for me not to use it.
> > > > I find it important that is doesn't make a copy in the success
> > > > case,
> > > > I
> > > > don't care very much for the error case.
> > > >
> > > > Would you mind elaborating on the real reasons why I shouldn't
> > > > use
> > > > it?
> > > >
> > > > cheers,
> > > > Matthias
> > > >
> > > > > - Sebastian
> > > > >
> > > > >
> > > > > > cheers,
> > > > > > Matthias
More information about the NumPy-Discussion
mailing list