[Numpy-discussion] Add guaranteed no-copy to array creation and reshape?

Matthias Geier matthias.geier at gmail.com
Wed Jan 2 05:27:48 EST 2019


Hi Sebastian.

Thanks for the clarification.

On Sun, Dec 30, 2018 at 5:25 PM Sebastian Berg wrote:
> On Sun, 2018-12-30 at 16:03 +0100, Matthias Geier wrote:
> > On Sat, Dec 29, 2018 at 6:00 PM Sebastian Berg wrote:
> > > On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:
> > > > Hi Sebastian.
> > > >
> > > > I don't have an opinion (yet) about this matter, but I have a
> > > > question:
> > > >
> > > > On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
> > > >
> > > > [...]
> > > >
> > > > > new_arr = arr.reshape(new_shape)
> > > > > assert np.may_share_memory(arr, new_arr)
> > > > >
> > > > > # Which is sometimes -- but should not be -- written as:
> > > > > arr.shape = new_shape  # unnecessary container modification
> > > >
> > > > [...]
> > > >
> > > > Why is this discouraged?
> > > >
> > > > Why do you call this "unnecessary container modification"?
> > > >
> > > > I've used this idiom in the past for exactly those cases where I
> > > > wanted to make sure no copy is made.
> > > >
> > > > And if we are not supposed to assign to arr.shape, why is it
> > > > allowed
> > > > in the first place?
> > >
> > > Well, this may be a matter of taste, but say you have an object
> > > that
> > > stores an array:
> > >
> > > class MyObject:
> > >     def __init__(self):
> > >         self.myarr = some_array
> > >
> > >
> > > Now, lets say I do:
> > >
> > > def some_func(arr):
> > >     # Do something with the array:
> > >     arr.shape = -1
> > >
> > > myobject = MyObject()
> > > some_func(myobject)
> > >
> > > then myobject will suddenly have the wrong shape stored. In most
> > > cases
> > > this is harmless, but I truly believe this is exactly why we have
> > > views
> > > and why they are so awesome.
> > > The content of arrays is mutable, but the array object itself
> > > should
> > > not be muted normally.
> >
> > Thanks for the example! I don't understand its point, though.
> > Also, it's not working since MyObject doesn't have a .shape
> > attribute.
> >
>
> The example should have called `some_func(myobject.arr)`. The thing is
> that if you have more references to the same array around, you change
> all their shapes. And if those other references are there for a reason,
> that is not what you want.
>
> That does not matter much in most cases, but it could change the shape
> of an array in a completely different place then intended. Creating a
> new view is cheap, so I think such things should be avoided.
>
> I admit, most code will effectively do:
> arr = input_arr[...]  # create a new view
> arr.shape = ...
>
> so that there is no danger. But conceptually, I do not think there
> should be a danger of magically changing the shape of a stored array in
> a different part of the code.
>
> Does that make some sense? Maybe shorter example:
>
> arr = np.arange(10)
> arr2 = arr
> arr2.shape = (5, 2)
>
> print(arr.shape)  # also (5, 2)
>
> so the arr container (shape, dtype) is changed/muted. I think we expect
> that for content here, but not for the shape.

Thanks for the clarification, I think I now understand your example.

However, the behavior you are describing is just like the normal
reference semantics of Python itself.

If you have multiple identifiers bound to the same (mutable) object,
you'll always have this "problem".

I think every Python user should be aware of this behavior, but I
don't think it is reason to discourage assigning to arr.shape.

Coming back to the original suggestion of this thread:
Since assigning to arr.shape makes sure no copy of the array data is
made, I don't think it's necessary to add a new no-copy argument to
reshape().

But the bug you mentioned ("on error the `arr.shape = ...` code
currently creates the copy temporarily") should probably be fixed at
some point ...

cheers,
Matthias

>
> - Sebastian
>
>
> > > There may be some corner cases, but a lot of the
> > > "than why is it allowed" questions are answered with: for history
> > > reasons.
> >
> > OK, that's a good point.
> >
> > > By the way, on error the `arr.shape = ...` code currently creates
> > > the
> > > copy temporarily.
> >
> > That's interesting and it should probably be fixed.
> >
> > But it is not reason enough for me not to use it.
> > I find it important that is doesn't make a copy in the success case,
> > I
> > don't care very much for the error case.
> >
> > Would you mind elaborating on the real reasons why I shouldn't use
> > it?
> >
> > cheers,
> > Matthias
> >
> > > - Sebastian
> > >
> > >
> > > > cheers,
> > > > Matthias
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


More information about the NumPy-Discussion mailing list