
On Mon, 2019-01-07 at 20:04 +0100, Matthias Geier wrote:
On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
Hi Sebastian.
Thanks for the clarification.
<snip>
print(arr.shape) # also (5, 2)
so the arr container (shape, dtype) is changed/muted. I think we expect that for content here, but not for the shape.
Thanks for the clarification, I think I now understand your example.
However, the behavior you are describing is just like the normal reference semantics of Python itself.
If you have multiple identifiers bound to the same (mutable) object, you'll always have this "problem".
I think every Python user should be aware of this behavior, but I don't think it is reason to discourage assigning to arr.shape.
Well, I doubt I will convince you.
I think we actually have quite little disagreement.
I am very sorry, that was very badly phrased. What I meant is just that saying that arrays simply are mutable objects does not seem all that wrong to me. However, I would very much prefer to move towards changing it for the container part.
I agree with you on what should be done *most of the time*, but I wouldn't totally discourage mutating NumPy array shapes, because I think in the right circumstances it can be very useful.
But want to point out that a numpy array is:
* underlying data * shape/strides (pointing to the exact data) * data type (interpret the data)
Arrays are mutable, but this is only half true from my perspective. <snip> In other words, I see array objects as having two sides to them [0]:
* Underlying data -> normally mutable and often muted * container: -> not muted by almost all code * shape/strides * data type
Exactly: "almost all code".
Most of the time I would not assign to arr.shape, but in some rare occasions I find it very useful.
And one of those rare occasions is when you want guaranteed no-copy behavior.
`arr.shape = ...` is somewhat easier on the eye when compared to `arr = arr.reshape(..., ensure_view=True)` (or whatever we would end up doing). Other than that, do you have a technical reason for it (aside from ensuring that no copy will occur)? The thing is that I have seen the suggestion to use `arr.shape` as a "best practices". And I really disagree that it is very good practice.
There are also some (most likely significantly rarer) cases where I would modify arr.strides.
The same again, there should be no reason why you should have to do this. In fact, this will cause hard crashes for object arrays, so even technically this is not a "harmless" convenience, it is a bug to allow for object arrays – which own their data – at all: arr = np.arange(100000, dtype=object) arr.strides = (0,) del arr free(): invalid pointer zsh: abort (core dumped) Note that using the alternatives is completely fine here (as long as you take care all point to valid objects).
I realize that in some cases muting the container metadata happens. But I do believe it should be as minimal as possible. And frankly, probably one could do away with it completely.
I guess that's the only point where we disagree.
I wouldn't completely discourage it and I would definitely not remove the functionality.
Another example for where it is bad would be a threaded environment. If a python function temporarily changes the shape of an array to read from it without creating a view first, this will break multi- threaded access to that array.
Sure, let's not use it while multi-threading then.
I still think that's not at all a reason to remove the feature.
While I wouldn't mind mind moving to deprecate it fully, that is not my intention right now. I would discourage it in the documentation with a pointer to a safe alternative. Maybe at some point we realize that nobody really has any good reason to keep using it, then we may move ahead slowly. - Sebastian
There are some things that are problematic when multi-threading, but that's typically not reason enough to completely disallow them.
cheers, Matthias
- Sebastian
[0] I tried to find other examples for such a split. Maybe a categorical/state object which is allowed change value/state. But the list of possible states cannot change.
Coming back to the original suggestion of this thread: Since assigning to arr.shape makes sure no copy of the array data is made, I don't think it's necessary to add a new no-copy argument to reshape().
But the bug you mentioned ("on error the `arr.shape = ...` code currently creates the copy temporarily") should probably be fixed at some point ...
cheers, Matthias
- Sebastian
There may be some corner cases, but a lot of the "than why is it allowed" questions are answered with: for history reasons.
OK, that's a good point.
By the way, on error the `arr.shape = ...` code currently creates the copy temporarily.
That's interesting and it should probably be fixed.
But it is not reason enough for me not to use it. I find it important that is doesn't make a copy in the success case, I don't care very much for the error case.
Would you mind elaborating on the real reasons why I shouldn't use it?
cheers, Matthias
- Sebastian
> cheers, > Matthias
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion