[Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

Matthias Geier matthias.geier at gmail.com
Mon Jan 7 14:04:58 EST 2019


On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
>
> On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> > Hi Sebastian.
> >
> > Thanks for the clarification.
> >
> <snip>
> > > print(arr.shape)  # also (5, 2)
> > >
> > > so the arr container (shape, dtype) is changed/muted. I think we
> > > expect
> > > that for content here, but not for the shape.
> >
> > Thanks for the clarification, I think I now understand your example.
> >
> > However, the behavior you are describing is just like the normal
> > reference semantics of Python itself.
> >
> > If you have multiple identifiers bound to the same (mutable) object,
> > you'll always have this "problem".
> >
> > I think every Python user should be aware of this behavior, but I
> > don't think it is reason to discourage assigning to arr.shape.
>
> Well, I doubt I will convince you.

I think we actually have quite little disagreement.

I agree with you on what should be done *most of the time*, but I
wouldn't totally discourage mutating NumPy array shapes, because I
think in the right circumstances it can be very useful.

> But want to point out that a numpy
> array is:
>
>   * underlying data
>   * shape/strides (pointing to the exact data)
>   * data type (interpret the data)
>
> Arrays are mutable, but this is only half true from my perspective.
> Everyone using numpy should be aware of "views", i.e. that the content
> of the underlying data can change.

I agree, everyone should be aware of that.

> However, if I have a read-only array, and pass it around, I would not
> expect it to change. That is because while the underlying data is
> muted, how this data is accessed and interpreted is not.
>
> In other words, I see array objects as having two sides to them [0]:
>
>   * Underlying data   -> normally mutable and often muted
>   * container:        -> not muted by almost all code
>       * shape/strides
>       * data type

Exactly: "almost all code".

Most of the time I would not assign to arr.shape, but in some rare
occasions I find it very useful.

And one of those rare occasions is when you want guaranteed no-copy behavior.

There are also some (most likely significantly rarer) cases where I
would modify arr.strides.

> I realize that in some cases muting the container metadata happens. But
> I do believe it should be as minimal as possible. And frankly, probably
> one could do away with it completely.

I guess that's the only point where we disagree.

I wouldn't completely discourage it and I would definitely not remove
the functionality.

> Another example for where it is bad would be a threaded environment. If
> a python function temporarily changes the shape of an array to read
> from it without creating a view first, this will break multi-threaded
> access to that array.

Sure, let's not use it while multi-threading then.

I still think that's not at all a reason to remove the feature.

There are some things that are problematic when multi-threading, but
that's typically not reason enough to completely disallow them.

cheers,
Matthias

>
> - Sebastian
>
>
> [0] I tried to find other examples for such a split. Maybe a
> categorical/state object which is allowed change value/state. But the
> list of possible states cannot change.
>
>
> > Coming back to the original suggestion of this thread:
> > Since assigning to arr.shape makes sure no copy of the array data is
> > made, I don't think it's necessary to add a new no-copy argument to
> > reshape().
> >
> > But the bug you mentioned ("on error the `arr.shape = ...` code
> > currently creates the copy temporarily") should probably be fixed at
> > some point ...
> >
> > cheers,
> > Matthias
> >
> > > - Sebastian
> > >
> > >
> > > > > There may be some corner cases, but a lot of the
> > > > > "than why is it allowed" questions are answered with: for
> > > > > history
> > > > > reasons.
> > > >
> > > > OK, that's a good point.
> > > >
> > > > > By the way, on error the `arr.shape = ...` code currently
> > > > > creates
> > > > > the
> > > > > copy temporarily.
> > > >
> > > > That's interesting and it should probably be fixed.
> > > >
> > > > But it is not reason enough for me not to use it.
> > > > I find it important that is doesn't make a copy in the success
> > > > case,
> > > > I
> > > > don't care very much for the error case.
> > > >
> > > > Would you mind elaborating on the real reasons why I shouldn't
> > > > use
> > > > it?
> > > >
> > > > cheers,
> > > > Matthias
> > > >
> > > > > - Sebastian
> > > > >
> > > > >
> > > > > > cheers,
> > > > > > Matthias


More information about the NumPy-Discussion mailing list