[Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

Sebastian Berg sebastian at sipsolutions.net
Wed Jan 2 08:22:15 EST 2019


On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
> Hi Sebastian.
> 
> Thanks for the clarification.
> 
<snip>
> > print(arr.shape)  # also (5, 2)
> > 
> > so the arr container (shape, dtype) is changed/muted. I think we
> > expect
> > that for content here, but not for the shape.
> 
> Thanks for the clarification, I think I now understand your example.
> 
> However, the behavior you are describing is just like the normal
> reference semantics of Python itself.
> 
> If you have multiple identifiers bound to the same (mutable) object,
> you'll always have this "problem".
> 
> I think every Python user should be aware of this behavior, but I
> don't think it is reason to discourage assigning to arr.shape.

Well, I doubt I will convince you. But want to point out that a numpy
array is:

  * underlying data
  * shape/strides (pointing to the exact data)
  * data type (interpret the data)

Arrays are mutable, but this is only half true from my perspective.
Everyone using numpy should be aware of "views", i.e. that the content
of the underlying data can change.

However, if I have a read-only array, and pass it around, I would not
expect it to change. That is because while the underlying data is
muted, how this data is accessed and interpreted is not.

In other words, I see array objects as having two sides to them [0]:

  * Underlying data   -> normally mutable and often muted
  * container:        -> not muted by almost all code
      * shape/strides
      * data type

I realize that in some cases muting the container metadata happens. But
I do believe it should be as minimal as possible. And frankly, probably
one could do away with it completely.

Another example for where it is bad would be a threaded environment. If
a python function temporarily changes the shape of an array to read
from it without creating a view first, this will break multi-threaded
access to that array.

- Sebastian


[0] I tried to find other examples for such a split. Maybe a
categorical/state object which is allowed change value/state. But the
list of possible states cannot change.


> Coming back to the original suggestion of this thread:
> Since assigning to arr.shape makes sure no copy of the array data is
> made, I don't think it's necessary to add a new no-copy argument to
> reshape().
> 
> But the bug you mentioned ("on error the `arr.shape = ...` code
> currently creates the copy temporarily") should probably be fixed at
> some point ...
> 
> cheers,
> Matthias
> 
> > - Sebastian
> > 
> > 
> > > > There may be some corner cases, but a lot of the
> > > > "than why is it allowed" questions are answered with: for
> > > > history
> > > > reasons.
> > > 
> > > OK, that's a good point.
> > > 
> > > > By the way, on error the `arr.shape = ...` code currently
> > > > creates
> > > > the
> > > > copy temporarily.
> > > 
> > > That's interesting and it should probably be fixed.
> > > 
> > > But it is not reason enough for me not to use it.
> > > I find it important that is doesn't make a copy in the success
> > > case,
> > > I
> > > don't care very much for the error case.
> > > 
> > > Would you mind elaborating on the real reasons why I shouldn't
> > > use
> > > it?
> > > 
> > > cheers,
> > > Matthias
> > > 
> > > > - Sebastian
> > > > 
> > > > 
> > > > > cheers,
> > > > > Matthias
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion at python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > 
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190102/2804c6f0/attachment.sig>


More information about the NumPy-Discussion mailing list