[Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

Sebastian Berg sebastian at sipsolutions.net
Mon Jan 7 14:52:10 EST 2019


On Mon, 2019-01-07 at 20:04 +0100, Matthias Geier wrote:
> On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
> > On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> > > Hi Sebastian.
> > > 
> > > Thanks for the clarification.
> > > 
> > <snip>
> > > > print(arr.shape)  # also (5, 2)
> > > > 
> > > > so the arr container (shape, dtype) is changed/muted. I think
> > > > we
> > > > expect
> > > > that for content here, but not for the shape.
> > > 
> > > Thanks for the clarification, I think I now understand your
> > > example.
> > > 
> > > However, the behavior you are describing is just like the normal
> > > reference semantics of Python itself.
> > > 
> > > If you have multiple identifiers bound to the same (mutable)
> > > object,
> > > you'll always have this "problem".
> > > 
> > > I think every Python user should be aware of this behavior, but I
> > > don't think it is reason to discourage assigning to arr.shape.
> > 
> > Well, I doubt I will convince you.
> 
> I think we actually have quite little disagreement.
> 


I am very sorry, that was very badly phrased. What I meant is just that
saying that arrays simply are mutable objects does not seem all that
wrong to me. However, I would very much prefer to move towards changing
it for the container part.

> I agree with you on what should be done *most of the time*, but I
> wouldn't totally discourage mutating NumPy array shapes, because I
> think in the right circumstances it can be very useful.
> 
> > But want to point out that a numpy
> > array is:
> > 
> >   * underlying data
> >   * shape/strides (pointing to the exact data)
> >   * data type (interpret the data)
> > 
> > Arrays are mutable, but this is only half true from my perspective.
<snip>
> > In other words, I see array objects as having two sides to them
> > [0]:
> > 
> >   * Underlying data   -> normally mutable and often muted
> >   * container:        -> not muted by almost all code
> >       * shape/strides
> >       * data type
> 
> Exactly: "almost all code".
> 
> Most of the time I would not assign to arr.shape, but in some rare
> occasions I find it very useful.
> 
> And one of those rare occasions is when you want guaranteed no-copy
> behavior.


`arr.shape = ...` is somewhat easier on the eye when compared to
`arr = arr.reshape(..., ensure_view=True)` (or whatever we would end up
doing).

Other than that, do you have a technical reason for it (aside from
ensuring that no copy will occur)?

The thing is that I have seen the suggestion to use `arr.shape` as a
"best practices". And I really disagree that it is very good practice.

> There are also some (most likely significantly rarer) cases where I
> would modify arr.strides.
> 

The same again, there should be no reason why you should have to do
this. In fact, this will cause hard crashes for object arrays, so even
technically this is not a "harmless" convenience, it is a bug to allow
for object arrays – which own their data – at all:

arr = np.arange(100000, dtype=object)                  
arr.strides = (0,)      
del arr
free(): invalid pointer
zsh: abort (core dumped)

Note that using the alternatives is completely fine here (as long as
you take care all point to valid objects).


> > I realize that in some cases muting the container metadata happens.
> > But
> > I do believe it should be as minimal as possible. And frankly,
> > probably
> > one could do away with it completely.
> 
> I guess that's the only point where we disagree.
> 
> I wouldn't completely discourage it and I would definitely not remove
> the functionality.
> 
> > Another example for where it is bad would be a threaded
> > environment. If
> > a python function temporarily changes the shape of an array to read
> > from it without creating a view first, this will break multi-
> > threaded
> > access to that array.
> 
> Sure, let's not use it while multi-threading then.
>
> I still think that's not at all a reason to remove the feature.
> 

While I wouldn't mind mind moving to deprecate it fully, that is not my
intention right now. I would discourage it in the documentation with a
pointer to a safe alternative. Maybe at some point we realize that
nobody really has any good reason to keep using it, then we may move
ahead slowly.

- Sebastian


> There are some things that are problematic when multi-threading, but
> that's typically not reason enough to completely disallow them.
> 
> cheers,
> Matthias
> 
> > - Sebastian
> > 
> > 
> > [0] I tried to find other examples for such a split. Maybe a
> > categorical/state object which is allowed change value/state. But
> > the
> > list of possible states cannot change.
> > 
> > 
> > > Coming back to the original suggestion of this thread:
> > > Since assigning to arr.shape makes sure no copy of the array data
> > > is
> > > made, I don't think it's necessary to add a new no-copy argument
> > > to
> > > reshape().
> > > 
> > > But the bug you mentioned ("on error the `arr.shape = ...` code
> > > currently creates the copy temporarily") should probably be fixed
> > > at
> > > some point ...
> > > 
> > > cheers,
> > > Matthias
> > > 
> > > > - Sebastian
> > > > 
> > > > 
> > > > > > There may be some corner cases, but a lot of the
> > > > > > "than why is it allowed" questions are answered with: for
> > > > > > history
> > > > > > reasons.
> > > > > 
> > > > > OK, that's a good point.
> > > > > 
> > > > > > By the way, on error the `arr.shape = ...` code currently
> > > > > > creates
> > > > > > the
> > > > > > copy temporarily.
> > > > > 
> > > > > That's interesting and it should probably be fixed.
> > > > > 
> > > > > But it is not reason enough for me not to use it.
> > > > > I find it important that is doesn't make a copy in the
> > > > > success
> > > > > case,
> > > > > I
> > > > > don't care very much for the error case.
> > > > > 
> > > > > Would you mind elaborating on the real reasons why I
> > > > > shouldn't
> > > > > use
> > > > > it?
> > > > > 
> > > > > cheers,
> > > > > Matthias
> > > > > 
> > > > > > - Sebastian
> > > > > > 
> > > > > > 
> > > > > > > cheers,
> > > > > > > Matthias
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190107/3f50408e/attachment.sig>


More information about the NumPy-Discussion mailing list