Add guaranteed no-copy to array creation and reshape?
Hi all, In https://github.com/numpy/numpy/pull/11897 I am looking into the addition of a `copy=np.never_copy` argument to: * np.array * arr.reshape/np.reshape * arr.astype Which would cause an error to be raised when numpy cannot guarantee that the returned array is a view of the input array. The motivation is to easier avoid accidental copies of large data, or ensure that in-place manipulation will be meaningful. The copy flag API would be: * `copy=True` forces a copy * `copy=False` allows numpy to copy if necessary * `copy=np.never_copy` will error if a copy would be necessary * (almost) all other input will be deprecated. Unfortunately using `copy="never"` is tricky, because currently `np.array(..., copy="never")` behaves exactly the same as `np.array(..., copy=bool("never"))`. So that the wrong result would be given on old numpy versions and it would be a behaviour change. Some things that are a not so nice maybe: * adding/using `np.never_copy` is not very nice * Scalars need a copy and so will not be allowed * For rare array-likes numpy may not be able to guarantee no-copy, although it could happen (but should not). The history is that a long while ago I considered adding a copy flag to `reshape` so that it is possible to do `copy=np.never_copy` (or similar) to ensure that no copy is made. In these, you may want something like an assertion: ``` new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr) # Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification # Or: view = np.array(arr, order="F") assert np.may_share_memory(arr, new_arr) ``` but is more readable and will not cause an intermediate copy on error. So what do you think? Other variants would be to not expose this for `np.array` and probably limit `copy="never"` to the reshape method. Or just to not do it at all. Or to also accept "never" for `reshape`, although I think I would prefer to keep it in sync and wait for a few years to consider that. Best, Sebastian
On Wed, Dec 26, 2018 at 3:29 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
In https://github.com/numpy/numpy/pull/11897 I am looking into the addition of a `copy=np.never_copy` argument to: * np.array * arr.reshape/np.reshape * arr.astype
Which would cause an error to be raised when numpy cannot guarantee that the returned array is a view of the input array. The motivation is to easier avoid accidental copies of large data, or ensure that in-place manipulation will be meaningful.
The copy flag API would be: * `copy=True` forces a copy * `copy=False` allows numpy to copy if necessary * `copy=np.never_copy` will error if a copy would be necessary * (almost) all other input will be deprecated.
Unfortunately using `copy="never"` is tricky, because currently `np.array(..., copy="never")` behaves exactly the same as `np.array(..., copy=bool("never"))`. So that the wrong result would be given on old numpy versions and it would be a behaviour change.
I think np.never_copy is really ugly. I'd much rather simply use 'never', and clearly document that if users start using this and they critically rely on it really being never, then they should ensure that their code is only used with numpy >= 1.17.0. Note also that this would not be a backwards compatibility break, because `copy` is now clearly documented as bool, and not bool_like or some such thing. So we do not need to worry about the very improbable case that users now are using `copy='never'`. If others think `copy='never'` isn't acceptable now, there are two other options: 1. add code first to catch `copy='never'` in 1.17.x and raise on it, then in a later numpy version introduce it. 2. just do nothing. I'd prefer that over `np.never_copy`. Cheers, Ralf
Some things that are a not so nice maybe: * adding/using `np.never_copy` is not very nice * Scalars need a copy and so will not be allowed * For rare array-likes numpy may not be able to guarantee no-copy, although it could happen (but should not).
The history is that a long while ago I considered adding a copy flag to `reshape` so that it is possible to do `copy=np.never_copy` (or similar) to ensure that no copy is made. In these, you may want something like an assertion:
``` new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr)
# Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification
# Or: view = np.array(arr, order="F") assert np.may_share_memory(arr, new_arr) ```
but is more readable and will not cause an intermediate copy on error.
So what do you think? Other variants would be to not expose this for `np.array` and probably limit `copy="never"` to the reshape method. Or just to not do it at all. Or to also accept "never" for `reshape`, although I think I would prefer to keep it in sync and wait for a few years to consider that.
Best,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, 2018-12-26 at 16:40 -0800, Ralf Gommers wrote:
On Wed, Dec 26, 2018 at 3:29 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
Hi all,
In https://github.com/numpy/numpy/pull/11897 I am looking into the addition of a `copy=np.never_copy` argument to: * np.array * arr.reshape/np.reshape * arr.astype
Which would cause an error to be raised when numpy cannot guarantee that the returned array is a view of the input array. The motivation is to easier avoid accidental copies of large data, or ensure that in-place manipulation will be meaningful.
The copy flag API would be: * `copy=True` forces a copy * `copy=False` allows numpy to copy if necessary * `copy=np.never_copy` will error if a copy would be necessary * (almost) all other input will be deprecated.
Unfortunately using `copy="never"` is tricky, because currently `np.array(..., copy="never")` behaves exactly the same as `np.array(..., copy=bool("never"))`. So that the wrong result would be given on old numpy versions and it would be a behaviour change.
I think np.never_copy is really ugly. I'd much rather simply use 'never', and clearly document that if users start using this and they critically rely on it really being never, then they should ensure that their code is only used with numpy >= 1.17.0.
Note also that this would not be a backwards compatibility break, because `copy` is now clearly documented as bool, and not bool_like or some such thing. So we do not need to worry about the very improbable case that users now are using `copy='never'`.
I agree that it is much nicer to pass "never" and that I am not too worried about people actually passing in strings. But, I am a bit worried of new code silently doing the exact opposite when run on an older version. Although, I guess we typically do not worry too much about such compatibilities and it is possible to make a note of it in the docs. - Sebastian
If others think `copy='never'` isn't acceptable now, there are two other options: 1. add code first to catch `copy='never'` in 1.17.x and raise on it, then in a later numpy version introduce it. 2. just do nothing. I'd prefer that over `np.never_copy`.
Cheers, Ralf
Some things that are a not so nice maybe: * adding/using `np.never_copy` is not very nice * Scalars need a copy and so will not be allowed * For rare array-likes numpy may not be able to guarantee no-copy, although it could happen (but should not).
The history is that a long while ago I considered adding a copy flag to `reshape` so that it is possible to do `copy=np.never_copy` (or similar) to ensure that no copy is made. In these, you may want something like an assertion:
``` new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr)
# Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification
# Or: view = np.array(arr, order="F") assert np.may_share_memory(arr, new_arr) ```
but is more readable and will not cause an intermediate copy on error.
So what do you think? Other variants would be to not expose this for `np.array` and probably limit `copy="never"` to the reshape method. Or just to not do it at all. Or to also accept "never" for `reshape`, although I think I would prefer to keep it in sync and wait for a few years to consider that.
Best,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I think np.never_copy is really ugly. I’d much rather simply use ‘never’, I actually think np.never_copy is the better choice of the two. Arguments for it: - It’s consistent with np.newaxis in spelling (modulo the _) - If mistyped, it can be caught easily by IDEs. - It’s typeable with mypy, unlike constant string literals which currently aren’t - If code written against new numpy is run on old numpy, it will error rather than doing the wrong thing - It won’t be possible to miss parts of our existing API that evaluate a copy argument in a boolean context - It won’t be possible for downstream projects to misinterpret by not checking for ‘never’ - an error will be raised instead. Arguments against it: - It’s more characters than "never" - The implementation of np.never_copy is a little verbose / ugly Eric On Thu, Dec 27, 2018, 1:41 AM Ralf Gommers <ralf.gommers@gmail.com wrote:
On Wed, Dec 26, 2018 at 3:29 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
In https://github.com/numpy/numpy/pull/11897 I am looking into the addition of a `copy=np.never_copy` argument to: * np.array * arr.reshape/np.reshape * arr.astype
Which would cause an error to be raised when numpy cannot guarantee that the returned array is a view of the input array. The motivation is to easier avoid accidental copies of large data, or ensure that in-place manipulation will be meaningful.
The copy flag API would be: * `copy=True` forces a copy * `copy=False` allows numpy to copy if necessary * `copy=np.never_copy` will error if a copy would be necessary * (almost) all other input will be deprecated.
Unfortunately using `copy="never"` is tricky, because currently `np.array(..., copy="never")` behaves exactly the same as `np.array(..., copy=bool("never"))`. So that the wrong result would be given on old numpy versions and it would be a behaviour change.
I think np.never_copy is really ugly. I'd much rather simply use 'never', and clearly document that if users start using this and they critically rely on it really being never, then they should ensure that their code is only used with numpy >= 1.17.0.
Note also that this would not be a backwards compatibility break, because `copy` is now clearly documented as bool, and not bool_like or some such thing. So we do not need to worry about the very improbable case that users now are using `copy='never'`.
If others think `copy='never'` isn't acceptable now, there are two other options: 1. add code first to catch `copy='never'` in 1.17.x and raise on it, then in a later numpy version introduce it. 2. just do nothing. I'd prefer that over `np.never_copy`.
Cheers, Ralf
Some things that are a not so nice maybe: * adding/using `np.never_copy` is not very nice * Scalars need a copy and so will not be allowed * For rare array-likes numpy may not be able to guarantee no-copy, although it could happen (but should not).
The history is that a long while ago I considered adding a copy flag to `reshape` so that it is possible to do `copy=np.never_copy` (or similar) to ensure that no copy is made. In these, you may want something like an assertion:
``` new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr)
# Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification
# Or: view = np.array(arr, order="F") assert np.may_share_memory(arr, new_arr) ```
but is more readable and will not cause an intermediate copy on error.
So what do you think? Other variants would be to not expose this for `np.array` and probably limit `copy="never"` to the reshape method. Or just to not do it at all. Or to also accept "never" for `reshape`, although I think I would prefer to keep it in sync and wait for a few years to consider that.
Best,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, 2018-12-27 at 14:50 +0100, Eric Wieser wrote:
I think np.never_copy is really ugly. I’d much rather simply use ‘never’,
I actually think np.never_copy is the better choice of the two. Arguments for it:
It’s consistent with np.newaxis in spelling (modulo the _) If mistyped, it can be caught easily by IDEs. It’s typeable with mypy, unlike constant string literals which currently aren’t If code written against new numpy is run on old numpy, it will error rather than doing the wrong thing
I am not sure I think that the above are too strong arguments, since it is not what is typically done for keywords (nobody suggests np.CLIP instead of "clip"). Unless you feel this is different because it is a mix of strings and bools here?
It won’t be possible to miss parts of our existing API that evaluate a copy argument in a boolean context It won’t be possible for downstream projects to misinterpret by not checking for ‘never’ - an error will be raised instead.
The last one is a good reason I think, since the dispatcher would pass on the string, so a downstream API user which are not picky about the input will interpret it wrong without the special argument. Unless we replace the string when dispatching, which seems strange on first sight. Overall, I am still a bit undecided right now. - Sebastian
Arguments against it:
It’s more characters than "never" The implementation of np.never_copy is a little verbose / ugly Eric
On Thu, Dec 27, 2018, 1:41 AM Ralf Gommers <ralf.gommers@gmail.com wrote:
On Wed, Dec 26, 2018 at 3:29 PM Sebastian Berg < sebastian@sipsolutions.net> wrote:
Hi all,
In https://github.com/numpy/numpy/pull/11897 I am looking into the addition of a `copy=np.never_copy` argument to: * np.array * arr.reshape/np.reshape * arr.astype
Which would cause an error to be raised when numpy cannot guarantee that the returned array is a view of the input array. The motivation is to easier avoid accidental copies of large data, or ensure that in-place manipulation will be meaningful.
The copy flag API would be: * `copy=True` forces a copy * `copy=False` allows numpy to copy if necessary * `copy=np.never_copy` will error if a copy would be necessary * (almost) all other input will be deprecated.
Unfortunately using `copy="never"` is tricky, because currently `np.array(..., copy="never")` behaves exactly the same as `np.array(..., copy=bool("never"))`. So that the wrong result would be given on old numpy versions and it would be a behaviour change.
I think np.never_copy is really ugly. I'd much rather simply use 'never', and clearly document that if users start using this and they critically rely on it really being never, then they should ensure that their code is only used with numpy >= 1.17.0.
Note also that this would not be a backwards compatibility break, because `copy` is now clearly documented as bool, and not bool_like or some such thing. So we do not need to worry about the very improbable case that users now are using `copy='never'`.
If others think `copy='never'` isn't acceptable now, there are two other options: 1. add code first to catch `copy='never'` in 1.17.x and raise on it, then in a later numpy version introduce it. 2. just do nothing. I'd prefer that over `np.never_copy`.
Cheers, Ralf
Some things that are a not so nice maybe: * adding/using `np.never_copy` is not very nice * Scalars need a copy and so will not be allowed * For rare array-likes numpy may not be able to guarantee no- copy, although it could happen (but should not).
The history is that a long while ago I considered adding a copy flag to `reshape` so that it is possible to do `copy=np.never_copy` (or similar) to ensure that no copy is made. In these, you may want something like an assertion:
``` new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr)
# Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification
# Or: view = np.array(arr, order="F") assert np.may_share_memory(arr, new_arr) ```
but is more readable and will not cause an intermediate copy on error.
So what do you think? Other variants would be to not expose this for `np.array` and probably limit `copy="never"` to the reshape method. Or just to not do it at all. Or to also accept "never" for `reshape`, although I think I would prefer to keep it in sync and wait for a few years to consider that.
Best,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Dec 28, 2018, at 3:40 AM, Sebastian Berg wrote:
It’s consistent with np.newaxis in spelling (modulo the _) If mistyped, it can be caught easily by IDEs. It’s typeable with mypy, unlike constant string literals which currently aren’t If code written against new numpy is run on old numpy, it will error rather than doing the wrong thing
I am not sure I think that the above are too strong arguments, since it is not what is typically done for keywords (nobody suggests np.CLIP instead of "clip"). Unless you feel this is different because it is a mix of strings and bools here?
:+1: to Eric's list. I don't think it's different because of the mix, I think it's different because deprecating things is painful. But now that we have good typing in Python, I think of string literals as an anti-pattern going forward. But, for me, the biggest reason is the silent different behaviour in old vs new NumPy. As a package maintainer this is just a nightmare. I have a lot of sympathy for the ugliness argument, but I don't think `np.never_copy` (or e.g. `np.modes.never`) is that much worse than a string.
On Thu, Dec 27, 2018 at 3:27 PM Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
On Fri, Dec 28, 2018, at 3:40 AM, Sebastian Berg wrote:
It’s consistent with np.newaxis in spelling (modulo the _) If mistyped, it can be caught easily by IDEs. It’s typeable with mypy, unlike constant string literals which currently aren’t If code written against new numpy is run on old numpy, it will error rather than doing the wrong thing
I am not sure I think that the above are too strong arguments, since it is not what is typically done for keywords (nobody suggests np.CLIP instead of "clip"). Unless you feel this is different because it is a mix of strings and bools here?
:+1: to Eric's list. I don't think it's different because of the mix, I think it's different because deprecating things is painful.
Technically there's nothing we are deprecating. If anything, the one not super uncommon pattern will be that people use 0/1 instead of False/True, which works as expected now and will start raising in case we'd go with np.never_copy But now that we have good typing in Python, I think of string literals as
an anti-pattern going forward.
I don't quite get that, it'll still be the most Pythonic thing to do in many API design scenarios. Adding a new class instance with unusual behavior like raising on bool() is not even a pattern, it would just be an oddity.
But, for me, the biggest reason is the silent different behaviour in old vs new NumPy. As a package maintainer this is just a nightmare.
That is the main downside indeed in this particular case.
I have a lot of sympathy for the ugliness argument, but I don't think `np.never_copy` (or e.g. `np.modes.never`) is that much worse than a string.
The point is not that `.reshape(..., copy=np.never_copy)` is uglier (it is, but indeed just a little), but that we're adding a new object to the numpy namespace that's without precedent. If we'd do that regularly the API would really blow up. np.newaxis is not relevant here - it's a simple alias for None, is just there for code readability, and is much more widely applicable than np.never_copy would be. Cheers, Ralf
On Fri, Dec 28, 2018, at 10:58 AM, Ralf Gommers wrote:
Technically there's nothing we are deprecating. If anything, the one not super uncommon pattern will be that people use 0/1 instead of False/True, which works as expected now and will start raising in case we'd go with np.never_copy
Oh, no, here I meant, I'd consider `np.modes.CLIP` to be something worth advocating for if it wasn't for deprecation cycles and extremely widespread usage.
But now that we have good typing in Python, I think of string literals as an anti-pattern going forward.
I don't quite get that, it'll still be the most Pythonic thing to do in many API design scenarios. Adding a new class instance with unusual behavior like raising on bool() is not even a pattern, it would just be an oddity.
Oh, no, I wouldn't suggest that the class raise on boolification, I suggest that the typing module warn if something other than a bool or an np.types.CopyMode is provided. Then you can have a useful typed function signature.
I have a lot of sympathy for the ugliness argument, but I don't think `np.never_copy` (or e.g. `np.modes.never`) is that much worse than a string.
The point is not that `.reshape(..., copy=np.never_copy)` is uglier (it is, but indeed just a little), but that we're adding a new object to the numpy namespace that's without precedent. If we'd do that regularly the API would really blow up.
Sure, I totally agree here, which is why I suggested `np.modes.never` as an alternative. Namespaces are a honking great idea!
On Thu, Dec 27, 2018 at 3:27 PM Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
On Fri, Dec 28, 2018, at 3:40 AM, Sebastian Berg wrote:
It’s consistent with np.newaxis in spelling (modulo the _) If mistyped, it can be caught easily by IDEs. It’s typeable with mypy, unlike constant string literals which currently aren’t If code written against new numpy is run on old numpy, it will error rather than doing the wrong thing
I am not sure I think that the above are too strong arguments, since it is not what is typically done for keywords (nobody suggests np.CLIP instead of "clip"). Unless you feel this is different because it is a mix of strings and bools here?
:+1: to Eric's list. I don't think it's different because of the mix, I think it's different because deprecating things is painful. But now that we have good typing in Python, I think of string literals as an anti-pattern going forward.
I've certainly seen people argue that it's better to use proper enum's in this kind of case instead of strings. The 'enum' package is even included in the stdlib on all our supported versions now: https://docs.python.org/3/library/enum.html I guess another possibility to throw out there would be a second kwarg, require_view=False/True. -n -- Nathaniel J. Smith -- https://vorpus.org
On Thu, 2018-12-27 at 17:45 -0800, Nathaniel Smith wrote:
On Thu, Dec 27, 2018 at 3:27 PM Juan Nunez-Iglesias < jni.soma@gmail.com> wrote:
On Fri, Dec 28, 2018, at 3:40 AM, Sebastian Berg wrote:
It’s consistent with np.newaxis in spelling (modulo the _) If mistyped, it can be caught easily by IDEs. It’s typeable with mypy, unlike constant string literals which currently aren’t If code written against new numpy is run on old numpy, it will error rather than doing the wrong thing
I am not sure I think that the above are too strong arguments, since it is not what is typically done for keywords (nobody suggests np.CLIP instead of "clip"). Unless you feel this is different because it is a mix of strings and bools here?
:+1: to Eric's list. I don't think it's different because of the mix, I think it's different because deprecating things is painful. But now that we have good typing in Python, I think of string literals as an anti-pattern going forward.
I've certainly seen people argue that it's better to use proper enum's in this kind of case instead of strings. The 'enum' package is even included in the stdlib on all our supported versions now: https://docs.python.org/3/library/enum.html
I am sympathetic with that, but it is something we (or scipy, etc.) currently simply do not use, so I am not sure that I think it has much validity at this time. That is least unless we agree to aim to use this more generally in the future.
I guess another possibility to throw out there would be a second kwarg, require_view=False/True.
My first gut feeling was that it is clumsy at least for `reshape`, but one will always only use one of the two arguments at a time. The more I look at it, the better the suggestion seems. Plus it reduces the possible `copy=False` not meaning "never" confusion. I think with a bit more pondering, that will become my favorite solution. - Sebastian
-n
Hi, Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy? All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error. On the implementation there are two choices if we use subclasses: - ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray. - ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray. If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental. - Yu On Fri, Dec 28, 2018 at 5:45 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Thu, 2018-12-27 at 17:45 -0800, Nathaniel Smith wrote:
On Thu, Dec 27, 2018 at 3:27 PM Juan Nunez-Iglesias < jni.soma@gmail.com> wrote:
On Fri, Dec 28, 2018, at 3:40 AM, Sebastian Berg wrote:
It’s consistent with np.newaxis in spelling (modulo the _) If mistyped, it can be caught easily by IDEs. It’s typeable with mypy, unlike constant string literals which currently aren’t If code written against new numpy is run on old numpy, it will error rather than doing the wrong thing
I am not sure I think that the above are too strong arguments, since it is not what is typically done for keywords (nobody suggests np.CLIP instead of "clip"). Unless you feel this is different because it is a mix of strings and bools here?
:+1: to Eric's list. I don't think it's different because of the mix, I think it's different because deprecating things is painful. But now that we have good typing in Python, I think of string literals as an anti-pattern going forward.
I've certainly seen people argue that it's better to use proper enum's in this kind of case instead of strings. The 'enum' package is even included in the stdlib on all our supported versions now: https://docs.python.org/3/library/enum.html
I am sympathetic with that, but it is something we (or scipy, etc.) currently simply do not use, so I am not sure that I think it has much validity at this time. That is least unless we agree to aim to use this more generally in the future.
I guess another possibility to throw out there would be a second kwarg, require_view=False/True.
My first gut feeling was that it is clumsy at least for `reshape`, but one will always only use one of the two arguments at a time. The more I look at it, the better the suggestion seems. Plus it reduces the possible `copy=False` not meaning "never" confusion.
I think with a bit more pondering, that will become my favorite solution.
- Sebastian
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Jan 7, 2019, 14:22 Feng Yu <rainwoodman@gmail.com wrote:
Hi,
Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?
All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.
On the implementation there are two choices if we use subclasses:
- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray. - ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.
If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.
- Yu
I would prefer a flag for this. Someone can make an array read-only by setting `arr.flags.writable=False`. So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
Hi Todd, I agree a flag is more suitable than classes. I would add another bonus of a flag than a function argument is to avoid massive contamination of function signatures for a global variation of behavior that affects many functions. Yu On Wed, Jan 9, 2019 at 11:34 PM Todd <toddrjen@gmail.com> wrote:
On Mon, Jan 7, 2019, 14:22 Feng Yu <rainwoodman@gmail.com wrote:
Hi,
Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?
All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.
On the implementation there are two choices if we use subclasses:
- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray. - ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.
If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.
- Yu
I would prefer a flag for this. Someone can make an array read-only by setting `arr.flags.writable=False`. So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
constants are easier to support for autocompletion than strings. My current env (emacs) will (usually) autocomplete the former, but not the latter. On Thu, Jan 10, 2019 at 2:21 PM Feng Yu <rainwoodman@gmail.com> wrote:
Hi Todd,
I agree a flag is more suitable than classes.
I would add another bonus of a flag than a function argument is to avoid massive contamination of function signatures for a global variation of behavior that affects many functions.
Yu
On Wed, Jan 9, 2019 at 11:34 PM Todd <toddrjen@gmail.com> wrote:
On Mon, Jan 7, 2019, 14:22 Feng Yu <rainwoodman@gmail.com wrote:
Hi,
Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?
All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.
On the implementation there are two choices if we use subclasses:
- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray. - ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.
If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.
- Yu
I would prefer a flag for this. Someone can make an array read-only by setting `arr.flags.writable=False`. So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, Jan 10, 2019 at 11:21 AM Feng Yu <rainwoodman@gmail.com> wrote:
Hi Todd,
I agree a flag is more suitable than classes.
I would add another bonus of a flag than a function argument is to avoid massive contamination of function signatures for a global variation of behavior that affects many functions.
I like this suggestion. Copy behavior fits very nicely with existing flags (e.g. UPDATEIFCOPY, WRITEABLE) and avoids both namespace pollution and complication docstrings. Ralf
Yu
On Wed, Jan 9, 2019 at 11:34 PM Todd <toddrjen@gmail.com> wrote:
On Mon, Jan 7, 2019, 14:22 Feng Yu <rainwoodman@gmail.com wrote:
Hi,
Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?
All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.
On the implementation there are two choices if we use subclasses:
- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray. - ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.
If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.
- Yu
I would prefer a flag for this. Someone can make an array read-only by setting `arr.flags.writable=False`. So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I don’t think a NEVERCOPY entry in arr.flags would make much sense. Is this really a sensible limitation to put on how data gets used? Isn’t it up to the algorithm to decide whether to copy its data, not the original owner of the data? It also leads to some tricky questions of precedence - would np.array(arr, copy=True) respect the flag or the argument? How about np.array(arr)? Is arr + 0 considered a copy? By keeping it as a value passed in via a copy= kwarg, we don’t need to answer any of those questions. Eric On Thu, 10 Jan 2019 at 20:28 Ralf Gommers ralf.gommers@gmail.com <http://mailto:ralf.gommers@gmail.com> wrote: On Thu, Jan 10, 2019 at 11:21 AM Feng Yu <rainwoodman@gmail.com> wrote:
Hi Todd,
I agree a flag is more suitable than classes.
I would add another bonus of a flag than a function argument is to avoid massive contamination of function signatures for a global variation of behavior that affects many functions.
I like this suggestion. Copy behavior fits very nicely with existing flags (e.g. UPDATEIFCOPY, WRITEABLE) and avoids both namespace pollution and complication docstrings.
Ralf
Yu
On Wed, Jan 9, 2019 at 11:34 PM Todd <toddrjen@gmail.com> wrote:
On Mon, Jan 7, 2019, 14:22 Feng Yu <rainwoodman@gmail.com wrote:
Hi,
Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?
All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.
On the implementation there are two choices if we use subclasses:
- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray. - ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.
If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.
- Yu
I would prefer a flag for this. Someone can make an array read-only by setting `arr.flags.writable=False`. So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Eric, I agree these are questions that shall be answered. I think issues you raised are toward functions always make a copy -- combining them with NEVERCOPY does sound sane. Your argument is that If the atom of the new behavior is per method, then there is no need to worry about those functions that does not sound sane when combined with NEVERCOPY. Here is a proposal that may address your concerns: 1. The name of the flag does not need to be NEVERCOPY -- especially it sounds insane combined with many methods. I think something like KEEPBASE may be easier to reason. The name KEEPBASE alwsy suggests it shall not be set on an object where base is None; this feature may simplify the matter. 2. Methods that creates copy shall always create a copy, regardless of a flag -- that's the definition of copy. (np.copy) KEEPBASE shall only affect methods that creates a new object object, which may reference the provided base or create a new base. 3. Functions that are not a ndarray method, shall ignore the flag, unless specified otherwise. (np.array) 4. arr + 0 when arr is KEEPBASE. It depends on if we think this triggered np.add(), or ndarray.__add__. I think ndarray.__add__ is a shortcut to call the ufunc np.add(). Thus the result of arr + 0 shall be the behavior on the ufunc np.add, which shall ignore the flag. Notice that in memory tight situations, users can already use inplace operations to avoid copies. This verbose but highly controlled syntax may be preferable than inferring an automated behavior that relies on KEEPBASE of arguments. 5. I think on the operation level, the difference between subclassing(flags) and function arguments would be: array.view(keepbase=True).reshape(-1) vs array.reshape(-1, keepbase=True) The gain is that no array method needs to be modified. The control can still be at the level of the algorithm. On Sat, Jan 12, 2019 at 11:22 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
I don’t think a NEVERCOPY entry in arr.flags would make much sense. Is this really a sensible limitation to put on how data gets used? Isn’t it up to the algorithm to decide whether to copy its data, not the original owner of the data?
It also leads to some tricky questions of precedence - would np.array(arr, copy=True) respect the flag or the argument? How about np.array(arr)? Is arr + 0 considered a copy? By keeping it as a value passed in via a copy= kwarg, we don’t need to answer any of those questions.
Eric
On Thu, 10 Jan 2019 at 20:28 Ralf Gommers ralf.gommers@gmail.com <http://mailto:ralf.gommers@gmail.com> wrote:
On Thu, Jan 10, 2019 at 11:21 AM Feng Yu <rainwoodman@gmail.com> wrote:
Hi Todd,
I agree a flag is more suitable than classes.
I would add another bonus of a flag than a function argument is to avoid massive contamination of function signatures for a global variation of behavior that affects many functions.
I like this suggestion. Copy behavior fits very nicely with existing flags (e.g. UPDATEIFCOPY, WRITEABLE) and avoids both namespace pollution and complication docstrings.
Ralf
Yu
On Wed, Jan 9, 2019 at 11:34 PM Todd <toddrjen@gmail.com> wrote:
On Mon, Jan 7, 2019, 14:22 Feng Yu <rainwoodman@gmail.com wrote:
Hi,
Was it ever brought up the possibility of a new array class (ndrefonly, ndview) that is strictly no copy?
All operations on ndrefonly will return ndrefonly and if the operation cannot be completed without making a copy, it shall throw an error.
On the implementation there are two choices if we use subclasses:
- ndrefonly can be a subclass of ndarray. The pattern would be subclass limiting functionality of super, but ndrefonly is a ndarray. - ndarray as a subclass of ndarray. Subclass supplements functionality of super. : ndarray will not throw an error when a copy is necessary. However ndarray is not a ndarray.
If we want to be wild they do not even need to be subclasses of each other, or maybe they shall both be subclasses of something more fundamental.
- Yu
I would prefer a flag for this. Someone can make an array read-only by setting `arr.flags.writable=False`. So along those lines, we could have a `arr.flags.copyable` flag that if set to `False` would result in an error of any operation tried to copy the data.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Hi Sebastian. I don't have an opinion (yet) about this matter, but I have a question: On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote: [...]
new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr)
# Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification
[...] Why is this discouraged? Why do you call this "unnecessary container modification"? I've used this idiom in the past for exactly those cases where I wanted to make sure no copy is made. And if we are not supposed to assign to arr.shape, why is it allowed in the first place? cheers, Matthias
On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:
Hi Sebastian.
I don't have an opinion (yet) about this matter, but I have a question:
On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
[...]
new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr)
# Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification
[...]
Why is this discouraged?
Why do you call this "unnecessary container modification"?
I've used this idiom in the past for exactly those cases where I wanted to make sure no copy is made.
And if we are not supposed to assign to arr.shape, why is it allowed in the first place?
Well, this may be a matter of taste, but say you have an object that stores an array: class MyObject: def __init__(self): self.myarr = some_array Now, lets say I do: def some_func(arr): # Do something with the array: arr.shape = -1 myobject = MyObject() some_func(myobject) then myobject will suddenly have the wrong shape stored. In most cases this is harmless, but I truly believe this is exactly why we have views and why they are so awesome. The content of arrays is mutable, but the array object itself should not be muted normally. There may be some corner cases, but a lot of the "than why is it allowed" questions are answered with: for history reasons. By the way, on error the `arr.shape = ...` code currently creates the copy temporarily. - Sebastian
cheers, Matthias _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sat, Dec 29, 2018 at 6:00 PM Sebastian Berg wrote:
On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:
Hi Sebastian.
I don't have an opinion (yet) about this matter, but I have a question:
On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
[...]
new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr)
# Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification
[...]
Why is this discouraged?
Why do you call this "unnecessary container modification"?
I've used this idiom in the past for exactly those cases where I wanted to make sure no copy is made.
And if we are not supposed to assign to arr.shape, why is it allowed in the first place?
Well, this may be a matter of taste, but say you have an object that stores an array:
class MyObject: def __init__(self): self.myarr = some_array
Now, lets say I do:
def some_func(arr): # Do something with the array: arr.shape = -1
myobject = MyObject() some_func(myobject)
then myobject will suddenly have the wrong shape stored. In most cases this is harmless, but I truly believe this is exactly why we have views and why they are so awesome. The content of arrays is mutable, but the array object itself should not be muted normally.
Thanks for the example! I don't understand its point, though. Also, it's not working since MyObject doesn't have a .shape attribute.
There may be some corner cases, but a lot of the "than why is it allowed" questions are answered with: for history reasons.
OK, that's a good point.
By the way, on error the `arr.shape = ...` code currently creates the copy temporarily.
That's interesting and it should probably be fixed. But it is not reason enough for me not to use it. I find it important that is doesn't make a copy in the success case, I don't care very much for the error case. Would you mind elaborating on the real reasons why I shouldn't use it? cheers, Matthias
- Sebastian
cheers, Matthias _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sun, 2018-12-30 at 16:03 +0100, Matthias Geier wrote:
On Sat, Dec 29, 2018 at 6:00 PM Sebastian Berg wrote:
On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:
Hi Sebastian.
I don't have an opinion (yet) about this matter, but I have a question:
On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
[...]
new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr)
# Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification
[...]
Why is this discouraged?
Why do you call this "unnecessary container modification"?
I've used this idiom in the past for exactly those cases where I wanted to make sure no copy is made.
And if we are not supposed to assign to arr.shape, why is it allowed in the first place?
Well, this may be a matter of taste, but say you have an object that stores an array:
class MyObject: def __init__(self): self.myarr = some_array
Now, lets say I do:
def some_func(arr): # Do something with the array: arr.shape = -1
myobject = MyObject() some_func(myobject)
then myobject will suddenly have the wrong shape stored. In most cases this is harmless, but I truly believe this is exactly why we have views and why they are so awesome. The content of arrays is mutable, but the array object itself should not be muted normally.
Thanks for the example! I don't understand its point, though. Also, it's not working since MyObject doesn't have a .shape attribute.
The example should have called `some_func(myobject.arr)`. The thing is that if you have more references to the same array around, you change all their shapes. And if those other references are there for a reason, that is not what you want. That does not matter much in most cases, but it could change the shape of an array in a completely different place then intended. Creating a new view is cheap, so I think such things should be avoided. I admit, most code will effectively do: arr = input_arr[...] # create a new view arr.shape = ... so that there is no danger. But conceptually, I do not think there should be a danger of magically changing the shape of a stored array in a different part of the code. Does that make some sense? Maybe shorter example: arr = np.arange(10) arr2 = arr arr2.shape = (5, 2) print(arr.shape) # also (5, 2) so the arr container (shape, dtype) is changed/muted. I think we expect that for content here, but not for the shape. - Sebastian
There may be some corner cases, but a lot of the "than why is it allowed" questions are answered with: for history reasons.
OK, that's a good point.
By the way, on error the `arr.shape = ...` code currently creates the copy temporarily.
That's interesting and it should probably be fixed.
But it is not reason enough for me not to use it. I find it important that is doesn't make a copy in the success case, I don't care very much for the error case.
Would you mind elaborating on the real reasons why I shouldn't use it?
cheers, Matthias
- Sebastian
cheers, Matthias _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Hi Sebastian. Thanks for the clarification. On Sun, Dec 30, 2018 at 5:25 PM Sebastian Berg wrote:
On Sun, 2018-12-30 at 16:03 +0100, Matthias Geier wrote:
On Sat, Dec 29, 2018 at 6:00 PM Sebastian Berg wrote:
On Sat, 2018-12-29 at 17:16 +0100, Matthias Geier wrote:
Hi Sebastian.
I don't have an opinion (yet) about this matter, but I have a question:
On Thu, Dec 27, 2018 at 12:30 AM Sebastian Berg wrote:
[...]
new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr)
# Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification
[...]
Why is this discouraged?
Why do you call this "unnecessary container modification"?
I've used this idiom in the past for exactly those cases where I wanted to make sure no copy is made.
And if we are not supposed to assign to arr.shape, why is it allowed in the first place?
Well, this may be a matter of taste, but say you have an object that stores an array:
class MyObject: def __init__(self): self.myarr = some_array
Now, lets say I do:
def some_func(arr): # Do something with the array: arr.shape = -1
myobject = MyObject() some_func(myobject)
then myobject will suddenly have the wrong shape stored. In most cases this is harmless, but I truly believe this is exactly why we have views and why they are so awesome. The content of arrays is mutable, but the array object itself should not be muted normally.
Thanks for the example! I don't understand its point, though. Also, it's not working since MyObject doesn't have a .shape attribute.
The example should have called `some_func(myobject.arr)`. The thing is that if you have more references to the same array around, you change all their shapes. And if those other references are there for a reason, that is not what you want.
That does not matter much in most cases, but it could change the shape of an array in a completely different place then intended. Creating a new view is cheap, so I think such things should be avoided.
I admit, most code will effectively do: arr = input_arr[...] # create a new view arr.shape = ...
so that there is no danger. But conceptually, I do not think there should be a danger of magically changing the shape of a stored array in a different part of the code.
Does that make some sense? Maybe shorter example:
arr = np.arange(10) arr2 = arr arr2.shape = (5, 2)
print(arr.shape) # also (5, 2)
so the arr container (shape, dtype) is changed/muted. I think we expect that for content here, but not for the shape.
Thanks for the clarification, I think I now understand your example. However, the behavior you are describing is just like the normal reference semantics of Python itself. If you have multiple identifiers bound to the same (mutable) object, you'll always have this "problem". I think every Python user should be aware of this behavior, but I don't think it is reason to discourage assigning to arr.shape. Coming back to the original suggestion of this thread: Since assigning to arr.shape makes sure no copy of the array data is made, I don't think it's necessary to add a new no-copy argument to reshape(). But the bug you mentioned ("on error the `arr.shape = ...` code currently creates the copy temporarily") should probably be fixed at some point ... cheers, Matthias
- Sebastian
There may be some corner cases, but a lot of the "than why is it allowed" questions are answered with: for history reasons.
OK, that's a good point.
By the way, on error the `arr.shape = ...` code currently creates the copy temporarily.
That's interesting and it should probably be fixed.
But it is not reason enough for me not to use it. I find it important that is doesn't make a copy in the success case, I don't care very much for the error case.
Would you mind elaborating on the real reasons why I shouldn't use it?
cheers, Matthias
- Sebastian
cheers, Matthias _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
Hi Sebastian.
Thanks for the clarification.
<snip>
print(arr.shape) # also (5, 2)
so the arr container (shape, dtype) is changed/muted. I think we expect that for content here, but not for the shape.
Thanks for the clarification, I think I now understand your example.
However, the behavior you are describing is just like the normal reference semantics of Python itself.
If you have multiple identifiers bound to the same (mutable) object, you'll always have this "problem".
I think every Python user should be aware of this behavior, but I don't think it is reason to discourage assigning to arr.shape.
Well, I doubt I will convince you. But want to point out that a numpy array is: * underlying data * shape/strides (pointing to the exact data) * data type (interpret the data) Arrays are mutable, but this is only half true from my perspective. Everyone using numpy should be aware of "views", i.e. that the content of the underlying data can change. However, if I have a read-only array, and pass it around, I would not expect it to change. That is because while the underlying data is muted, how this data is accessed and interpreted is not. In other words, I see array objects as having two sides to them [0]: * Underlying data -> normally mutable and often muted * container: -> not muted by almost all code * shape/strides * data type I realize that in some cases muting the container metadata happens. But I do believe it should be as minimal as possible. And frankly, probably one could do away with it completely. Another example for where it is bad would be a threaded environment. If a python function temporarily changes the shape of an array to read from it without creating a view first, this will break multi-threaded access to that array. - Sebastian [0] I tried to find other examples for such a split. Maybe a categorical/state object which is allowed change value/state. But the list of possible states cannot change.
Coming back to the original suggestion of this thread: Since assigning to arr.shape makes sure no copy of the array data is made, I don't think it's necessary to add a new no-copy argument to reshape().
But the bug you mentioned ("on error the `arr.shape = ...` code currently creates the copy temporarily") should probably be fixed at some point ...
cheers, Matthias
- Sebastian
There may be some corner cases, but a lot of the "than why is it allowed" questions are answered with: for history reasons.
OK, that's a good point.
By the way, on error the `arr.shape = ...` code currently creates the copy temporarily.
That's interesting and it should probably be fixed.
But it is not reason enough for me not to use it. I find it important that is doesn't make a copy in the success case, I don't care very much for the error case.
Would you mind elaborating on the real reasons why I shouldn't use it?
cheers, Matthias
- Sebastian
cheers, Matthias _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
Hi Sebastian.
Thanks for the clarification.
<snip>
print(arr.shape) # also (5, 2)
so the arr container (shape, dtype) is changed/muted. I think we expect that for content here, but not for the shape.
Thanks for the clarification, I think I now understand your example.
However, the behavior you are describing is just like the normal reference semantics of Python itself.
If you have multiple identifiers bound to the same (mutable) object, you'll always have this "problem".
I think every Python user should be aware of this behavior, but I don't think it is reason to discourage assigning to arr.shape.
Well, I doubt I will convince you.
I think we actually have quite little disagreement. I agree with you on what should be done *most of the time*, but I wouldn't totally discourage mutating NumPy array shapes, because I think in the right circumstances it can be very useful.
But want to point out that a numpy array is:
* underlying data * shape/strides (pointing to the exact data) * data type (interpret the data)
Arrays are mutable, but this is only half true from my perspective. Everyone using numpy should be aware of "views", i.e. that the content of the underlying data can change.
I agree, everyone should be aware of that.
However, if I have a read-only array, and pass it around, I would not expect it to change. That is because while the underlying data is muted, how this data is accessed and interpreted is not.
In other words, I see array objects as having two sides to them [0]:
* Underlying data -> normally mutable and often muted * container: -> not muted by almost all code * shape/strides * data type
Exactly: "almost all code". Most of the time I would not assign to arr.shape, but in some rare occasions I find it very useful. And one of those rare occasions is when you want guaranteed no-copy behavior. There are also some (most likely significantly rarer) cases where I would modify arr.strides.
I realize that in some cases muting the container metadata happens. But I do believe it should be as minimal as possible. And frankly, probably one could do away with it completely.
I guess that's the only point where we disagree. I wouldn't completely discourage it and I would definitely not remove the functionality.
Another example for where it is bad would be a threaded environment. If a python function temporarily changes the shape of an array to read from it without creating a view first, this will break multi-threaded access to that array.
Sure, let's not use it while multi-threading then. I still think that's not at all a reason to remove the feature. There are some things that are problematic when multi-threading, but that's typically not reason enough to completely disallow them. cheers, Matthias
- Sebastian
[0] I tried to find other examples for such a split. Maybe a categorical/state object which is allowed change value/state. But the list of possible states cannot change.
Coming back to the original suggestion of this thread: Since assigning to arr.shape makes sure no copy of the array data is made, I don't think it's necessary to add a new no-copy argument to reshape().
But the bug you mentioned ("on error the `arr.shape = ...` code currently creates the copy temporarily") should probably be fixed at some point ...
cheers, Matthias
- Sebastian
There may be some corner cases, but a lot of the "than why is it allowed" questions are answered with: for history reasons.
OK, that's a good point.
By the way, on error the `arr.shape = ...` code currently creates the copy temporarily.
That's interesting and it should probably be fixed.
But it is not reason enough for me not to use it. I find it important that is doesn't make a copy in the success case, I don't care very much for the error case.
Would you mind elaborating on the real reasons why I shouldn't use it?
cheers, Matthias
- Sebastian
cheers, Matthias
@Matthias: Most of the time I would not assign to arr.shape, but in some rare occasions I find it very useful. And one of those rare occasions is when you want guaranteed no-copy behavior. Can you come up with any other example? The only real argument you seem to have here is “my code uses arr.shape = ...“ and I don’t want it to break. That’s a fair argument, but all it really means is we should start emitting DeprecationWarning("Use arr = arr.reshape(..., copy=np.never_copy) instead of arr.shape = ..."), and consider having a long deprecation. If necessary we could compromise on just putting a warning in the docs, and not notifying the user at all. @Ralf np.newaxis is not relevant here - it’s a simple alias for None, is just there for code readability, and is much more widely applicable than np.never_copy would be. Is there any particular reason we chose to use None? If I were designing it again, I’d consider a singleton object with a better __repr__ @Nathaniel I guess another possibility to throw out there would be a second kwarg, require_view=False/True. The downside of this approach is that array-likes will definitely need updating to support this new behavior, whereas many may work out of the box if we extend the copy argument (like, say, maskedarray). This also ties into the __bool__ override - that will ensure that subclasses which don’t have a trivial reshape crash. @Sebastian: Unless we replace the string when dispatching, which seems strange on first sight. I’m envisaging cases where we don’t have a dispatcher at all: - Duck arrays implementing methods matching ndarray - Something like my_custom_function(arr, copy=...) that forwards its copy argument to reshape Eric On Mon, 7 Jan 2019 at 11:05 Matthias Geier <matthias.geier@gmail.com> wrote:
On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
Hi Sebastian.
Thanks for the clarification.
<snip>
print(arr.shape) # also (5, 2)
so the arr container (shape, dtype) is changed/muted. I think we expect that for content here, but not for the shape.
Thanks for the clarification, I think I now understand your example.
However, the behavior you are describing is just like the normal reference semantics of Python itself.
If you have multiple identifiers bound to the same (mutable) object, you'll always have this "problem".
I think every Python user should be aware of this behavior, but I don't think it is reason to discourage assigning to arr.shape.
Well, I doubt I will convince you.
I think we actually have quite little disagreement.
I agree with you on what should be done *most of the time*, but I wouldn't totally discourage mutating NumPy array shapes, because I think in the right circumstances it can be very useful.
But want to point out that a numpy array is:
* underlying data * shape/strides (pointing to the exact data) * data type (interpret the data)
Arrays are mutable, but this is only half true from my perspective. Everyone using numpy should be aware of "views", i.e. that the content of the underlying data can change.
I agree, everyone should be aware of that.
However, if I have a read-only array, and pass it around, I would not expect it to change. That is because while the underlying data is muted, how this data is accessed and interpreted is not.
In other words, I see array objects as having two sides to them [0]:
* Underlying data -> normally mutable and often muted * container: -> not muted by almost all code * shape/strides * data type
Exactly: "almost all code".
Most of the time I would not assign to arr.shape, but in some rare occasions I find it very useful.
And one of those rare occasions is when you want guaranteed no-copy behavior.
There are also some (most likely significantly rarer) cases where I would modify arr.strides.
I realize that in some cases muting the container metadata happens. But I do believe it should be as minimal as possible. And frankly, probably one could do away with it completely.
I guess that's the only point where we disagree.
I wouldn't completely discourage it and I would definitely not remove the functionality.
Another example for where it is bad would be a threaded environment. If a python function temporarily changes the shape of an array to read from it without creating a view first, this will break multi-threaded access to that array.
Sure, let's not use it while multi-threading then.
I still think that's not at all a reason to remove the feature.
There are some things that are problematic when multi-threading, but that's typically not reason enough to completely disallow them.
cheers, Matthias
- Sebastian
[0] I tried to find other examples for such a split. Maybe a categorical/state object which is allowed change value/state. But the list of possible states cannot change.
Coming back to the original suggestion of this thread: Since assigning to arr.shape makes sure no copy of the array data is made, I don't think it's necessary to add a new no-copy argument to reshape().
But the bug you mentioned ("on error the `arr.shape = ...` code currently creates the copy temporarily") should probably be fixed at some point ...
cheers, Matthias
- Sebastian
There may be some corner cases, but a lot of the "than why is it allowed" questions are answered with: for history reasons.
OK, that's a good point.
By the way, on error the `arr.shape = ...` code currently creates the copy temporarily.
That's interesting and it should probably be fixed.
But it is not reason enough for me not to use it. I find it important that is doesn't make a copy in the success case, I don't care very much for the error case.
Would you mind elaborating on the real reasons why I shouldn't use it?
cheers, Matthias
- Sebastian
> cheers, > Matthias
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Jan 7, 2019 at 11:30 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
@Ralf
np.newaxis is not relevant here - it’s a simple alias for None, is just there for code readability, and is much more widely applicable than np.never_copy would be.
Is there any particular reason we chose to use None? If I were designing it again, I’d consider a singleton object with a better __repr__
It stems from Numeric: https://mail.python.org/pipermail/python-list/2009-September/552203.html. Note that the Python builtin slice also uses None, but that's probably due to Numeric using it first. Agree that a singleton with a nice repr could be a better choice than None. The more important part of my comment was "widely applicable" though. Slicing is a lot more important than some keyword. And design-wise, filling the numpy namespace with singletons for keyword to other things in that same namespace just makes no sense to me. Cheers, Ralf
Slicing is a lot more important than some keyword. And design-wise, filling the numpy namespace with singletons for keyword to other things in that same namespace just makes no sense to me. At least from the perspective of discoverability, you could argue that string constants form a namespace of their won, and so growing the “string” namespace is not inherently better than growing any other. The main flaw in that comparison is that picking np.never_copy to be a singleton forever prevents us reusing that name to become a function. Perhaps the solution is to use np.NEVER_COPY instead - that’s never going to clash with a function name we want to add in future, and using upper attributes as arguments in that way is pretty typical for python ( subprocess.PIPE, socket.SOCK_STREAM, etc…) You could fairly argue that this approach is outdated in the face of enum.Enum - in which case we could go for the more heavy-handed np.CopyMode.NEVER, which still has a unique enough case for name clashes with functions never to be an issue. Eric On Wed, 9 Jan 2019 at 22:25 Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Jan 7, 2019 at 11:30 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
@Ralf
np.newaxis is not relevant here - it’s a simple alias for None, is just there for code readability, and is much more widely applicable than np.never_copy would be.
Is there any particular reason we chose to use None? If I were designing it again, I’d consider a singleton object with a better __repr__
It stems from Numeric: https://mail.python.org/pipermail/python-list/2009-September/552203.html. Note that the Python builtin slice also uses None, but that's probably due to Numeric using it first.
Agree that a singleton with a nice repr could be a better choice than None. The more important part of my comment was "widely applicable" though. Slicing is a lot more important than some keyword. And design-wise, filling the numpy namespace with singletons for keyword to other things in that same namespace just makes no sense to me.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, Jan 10, 2019, 01:55 Eric Wieser <wieser.eric+numpy@gmail.com wrote:
Slicing is a lot more important than some keyword. And design-wise, filling the numpy namespace with singletons for keyword to other things in that same namespace just makes no sense to me.
At least from the perspective of discoverability, you could argue that string constants form a namespace of their won, and so growing the “string” namespace is not inherently better than growing any other. The main flaw in that comparison is that picking np.never_copy to be a singleton forever prevents us reusing that name to become a function.
Perhaps the solution is to use np.NEVER_COPY instead - that’s never going to clash with a function name we want to add in future, and using upper attributes as arguments in that way is pretty typical for python ( subprocess.PIPE, socket.SOCK_STREAM, etc…)
What about a namespace as someone mentioned earlier. Perhaps `np.flags.NO_COPY` You could fairly argue that this approach is outdated in the face of
enum.Enum - in which case we could go for the more heavy-handed np.CopyMode.NEVER, which still has a unique enough case for name clashes with functions never to be an issue.
Eric
Would all three conditions be supported this way or only `NEVER`?
On Wed, Jan 9, 2019 at 10:55 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Slicing is a lot more important than some keyword. And design-wise, filling the numpy namespace with singletons for keyword to other things in that same namespace just makes no sense to me.
At least from the perspective of discoverability, you could argue that string constants form a namespace of their won, and so growing the “string” namespace is not inherently better than growing any other.
I don't really think those are valid arguments. Strings are not a namespace, and if you want to make the analogy then it's at best a namespace within the functions/methods that the strings apply to. Discoverability: what is valid input for a keyword is discovered pretty much exclusively by reading the docstring or html doc page of the function, and not the docstring of some random object in the numpy namespace. Ralf
On Mon, 2019-01-07 at 20:04 +0100, Matthias Geier wrote:
On Wed, Jan 2, 2019 at 2:24 PM Sebastian Berg wrote:
On Wed, 2019-01-02 at 11:27 +0100, Matthias Geier wrote:
Hi Sebastian.
Thanks for the clarification.
<snip>
print(arr.shape) # also (5, 2)
so the arr container (shape, dtype) is changed/muted. I think we expect that for content here, but not for the shape.
Thanks for the clarification, I think I now understand your example.
However, the behavior you are describing is just like the normal reference semantics of Python itself.
If you have multiple identifiers bound to the same (mutable) object, you'll always have this "problem".
I think every Python user should be aware of this behavior, but I don't think it is reason to discourage assigning to arr.shape.
Well, I doubt I will convince you.
I think we actually have quite little disagreement.
I am very sorry, that was very badly phrased. What I meant is just that saying that arrays simply are mutable objects does not seem all that wrong to me. However, I would very much prefer to move towards changing it for the container part.
I agree with you on what should be done *most of the time*, but I wouldn't totally discourage mutating NumPy array shapes, because I think in the right circumstances it can be very useful.
But want to point out that a numpy array is:
* underlying data * shape/strides (pointing to the exact data) * data type (interpret the data)
Arrays are mutable, but this is only half true from my perspective. <snip> In other words, I see array objects as having two sides to them [0]:
* Underlying data -> normally mutable and often muted * container: -> not muted by almost all code * shape/strides * data type
Exactly: "almost all code".
Most of the time I would not assign to arr.shape, but in some rare occasions I find it very useful.
And one of those rare occasions is when you want guaranteed no-copy behavior.
`arr.shape = ...` is somewhat easier on the eye when compared to `arr = arr.reshape(..., ensure_view=True)` (or whatever we would end up doing). Other than that, do you have a technical reason for it (aside from ensuring that no copy will occur)? The thing is that I have seen the suggestion to use `arr.shape` as a "best practices". And I really disagree that it is very good practice.
There are also some (most likely significantly rarer) cases where I would modify arr.strides.
The same again, there should be no reason why you should have to do this. In fact, this will cause hard crashes for object arrays, so even technically this is not a "harmless" convenience, it is a bug to allow for object arrays – which own their data – at all: arr = np.arange(100000, dtype=object) arr.strides = (0,) del arr free(): invalid pointer zsh: abort (core dumped) Note that using the alternatives is completely fine here (as long as you take care all point to valid objects).
I realize that in some cases muting the container metadata happens. But I do believe it should be as minimal as possible. And frankly, probably one could do away with it completely.
I guess that's the only point where we disagree.
I wouldn't completely discourage it and I would definitely not remove the functionality.
Another example for where it is bad would be a threaded environment. If a python function temporarily changes the shape of an array to read from it without creating a view first, this will break multi- threaded access to that array.
Sure, let's not use it while multi-threading then.
I still think that's not at all a reason to remove the feature.
While I wouldn't mind mind moving to deprecate it fully, that is not my intention right now. I would discourage it in the documentation with a pointer to a safe alternative. Maybe at some point we realize that nobody really has any good reason to keep using it, then we may move ahead slowly. - Sebastian
There are some things that are problematic when multi-threading, but that's typically not reason enough to completely disallow them.
cheers, Matthias
- Sebastian
[0] I tried to find other examples for such a split. Maybe a categorical/state object which is allowed change value/state. But the list of possible states cannot change.
Coming back to the original suggestion of this thread: Since assigning to arr.shape makes sure no copy of the array data is made, I don't think it's necessary to add a new no-copy argument to reshape().
But the bug you mentioned ("on error the `arr.shape = ...` code currently creates the copy temporarily") should probably be fixed at some point ...
cheers, Matthias
- Sebastian
There may be some corner cases, but a lot of the "than why is it allowed" questions are answered with: for history reasons.
OK, that's a good point.
By the way, on error the `arr.shape = ...` code currently creates the copy temporarily.
That's interesting and it should probably be fixed.
But it is not reason enough for me not to use it. I find it important that is doesn't make a copy in the success case, I don't care very much for the error case.
Would you mind elaborating on the real reasons why I shouldn't use it?
cheers, Matthias
- Sebastian
> cheers, > Matthias
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Wed, Dec 26, 2018, 18:29 Sebastian Berg <sebastian@sipsolutions.net wrote:
Hi all,
In https://github.com/numpy/numpy/pull/11897 I am looking into the addition of a `copy=np.never_copy` argument to: * np.array * arr.reshape/np.reshape * arr.astype
Which would cause an error to be raised when numpy cannot guarantee that the returned array is a view of the input array. The motivation is to easier avoid accidental copies of large data, or ensure that in-place manipulation will be meaningful.
The copy flag API would be: * `copy=True` forces a copy * `copy=False` allows numpy to copy if necessary * `copy=np.never_copy` will error if a copy would be necessary * (almost) all other input will be deprecated.
Unfortunately using `copy="never"` is tricky, because currently `np.array(..., copy="never")` behaves exactly the same as `np.array(..., copy=bool("never"))`. So that the wrong result would be given on old numpy versions and it would be a behaviour change.
Some things that are a not so nice maybe: * adding/using `np.never_copy` is not very nice * Scalars need a copy and so will not be allowed * For rare array-likes numpy may not be able to guarantee no-copy, although it could happen (but should not).
The history is that a long while ago I considered adding a copy flag to `reshape` so that it is possible to do `copy=np.never_copy` (or similar) to ensure that no copy is made. In these, you may want something like an assertion:
``` new_arr = arr.reshape(new_shape) assert np.may_share_memory(arr, new_arr)
# Which is sometimes -- but should not be -- written as: arr.shape = new_shape # unnecessary container modification
# Or: view = np.array(arr, order="F") assert np.may_share_memory(arr, new_arr) ```
but is more readable and will not cause an intermediate copy on error.
So what do you think? Other variants would be to not expose this for `np.array` and probably limit `copy="never"` to the reshape method. Or just to not do it at all. Or to also accept "never" for `reshape`, although I think I would prefer to keep it in sync and wait for a few years to consider that.
Best,
Sebastian
Could this approach be used to deprecate `ravel` and let us just use `flatten`?
On Fri, 2019-01-11 at 13:57 +1100, Juan Nunez-Iglesias wrote:
On 10 Jan 2019, at 6:35 pm, Todd <toddrjen@gmail.com> wrote:
Could this approach be used to deprecate `ravel` and let us just use `flatten`?
Could we not? `.ravel()` is everywhere and it matches `ravel_multi_index` and `unravel_index`.
True, I suppose either of those functions could get such an argument as well. Just side note `arr.ravel()` is also not quite equivalent to `arr.reshape(-1)`, since it actually provides some additional assurances on the result contiguity. - Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
participants (9)
-
Eric Wieser
-
Feng Yu
-
Juan Nunez-Iglesias
-
Matthias Geier
-
Nathaniel Smith
-
Neal Becker
-
Ralf Gommers
-
Sebastian Berg
-
Todd