Make as_strided result writeonly

Hi all, I have just opened a PR, to make as_strided writeonly (as default). The reasoning for this change is that an `as_strided` array often have self overlapping memory. However, writing to an array where multiple elements have the identical memory address can be confusing, and the results are typically unpredictable. Considering the danger, the proposal is to add a `readonly=True`. A poweruser (who that function is designed for anyway), could thus still get a writeable array. For the moment, writing to the result would raise a FutureWarning with `readonly="warn"`. Do you agree with this, or would it be a major inconvenience? - Sebastian

On Sat, Jan 23, 2016 at 1:25 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
I have just opened a PR, to make as_strided writeonly (as default). The
I think you meant readonly :-)
reasoning for this change is that an `as_strided` array often have self overlapping memory. However, writing to an array where multiple elements have the identical memory address can be confusing, and the results are typically unpredictable.
Considering the danger, the proposal is to add a `readonly=True`. A poweruser (who that function is designed for anyway), could thus still get a writeable array.
For the moment, writing to the result would raise a FutureWarning with `readonly="warn"`.
This should just be a deprecation warning, right? (Because switching an array from writeable->readonly might cause previously correct code to error out, but not to silently start returning different results.)
Do you agree with this, or would it be a major inconvenience?
AFAIK the only use cases for as_strided involve self-overlap (for non-self-overlap you can generally use reshape / indexing / etc. and it's much simpler). So +1 from me. -n -- Nathaniel J. Smith -- https://vorpus.org

I've used as_strided before to create an "endless" output array when I didn't care about the result of an operation, just the side effect. See eg here <https://github.com/scikit-image/scikit-image/blob/master/skimage/future/grap...>. So I would certainly like option to remain to get a writeable array. In general, I'm sceptical about whether the benefits outweigh the costs. On Sun, Jan 24, 2016 at 9:20 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Jan 23, 2016 at 1:25 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
I have just opened a PR, to make as_strided writeonly (as default). The
I think you meant readonly :-)
reasoning for this change is that an `as_strided` array often have self overlapping memory. However, writing to an array where multiple elements have the identical memory address can be confusing, and the results are typically unpredictable.
Considering the danger, the proposal is to add a `readonly=True`. A poweruser (who that function is designed for anyway), could thus still get a writeable array.
For the moment, writing to the result would raise a FutureWarning with `readonly="warn"`.
This should just be a deprecation warning, right? (Because switching an array from writeable->readonly might cause previously correct code to error out, but not to silently start returning different results.)
Do you agree with this, or would it be a major inconvenience?
AFAIK the only use cases for as_strided involve self-overlap (for non-self-overlap you can generally use reshape / indexing / etc. and it's much simpler). So +1 from me.
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On So, 2016-01-24 at 13:00 +1100, Juan Nunez-Iglesias wrote:
I've used as_strided before to create an "endless" output array when I didn't care about the result of an operation, just the side effect. See eg here. So I would certainly like option to remain to get a writeable array. In general, I'm sceptical about whether the benefits outweigh the costs.
Yeah, that is a real use case. I am not planning to remove the option, but it would be as a `readonly` keyword argument, which means you would need to make the code depend on the numpy version if you require a writable array [1]. This actually somewhat defeats the purpose of all of this, but `np.ndarray` can do this dummy thing for you I think, so you could get around that, but.... The purpose is that if you actually would use an as_strided array in your operation, the result is unpredictable (not just complicated). And while as_strided is IMO designed to be used by people who know what they are doing, I have a feeling it is being used quite a lot in general. We did a similar thing for the new `broadcast_to`, though I think there we decided to skip the readonly until complains happen. Actually there is one more thing I might do. And that is issue a UserWarning when new array quite likely points to invalid memory. - Sebastian [1] as_strided does not currently support arr.flags.writable = True for its result array.
On Sun, Jan 24, 2016 at 9:20 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Jan 23, 2016 at 1:25 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
I have just opened a PR, to make as_strided writeonly (as
default). The
I think you meant readonly :-)
reasoning for this change is that an `as_strided` array often have self overlapping memory. However, writing to an array where multiple elements have the identical memory address can be confusing, and the results are typically unpredictable.
Considering the danger, the proposal is to add a `readonly=True`. A poweruser (who that function is designed for anyway), could thus still get a writeable array.
For the moment, writing to the result would raise a FutureWarning with `readonly="warn"`.
This should just be a deprecation warning, right? (Because switching an array from writeable->readonly might cause previously correct code to error out, but not to silently start returning different results.)
Do you agree with this, or would it be a major inconvenience?
AFAIK the only use cases for as_strided involve self-overlap (for non-self-overlap you can generally use reshape / indexing / etc. and it's much simpler). So +1 from me.
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

Yeah, that is a real use case. I am not planning to remove the option, but it would be as a `readonly` keyword argument, which means you would need to make the code depend on the numpy version if you require a writable array [1].
[1] as_strided does not currently support arr.flags.writable = True for its result array.
Can you explain this in more detail? I'm writing to the result without problem. Anyway, it only occurred to me after your response that the deprecation path would be quite disruptive to downstream libraries, including scikit-image. My feeling is that unless someone has been bitten by this (?), the benefit does not outweigh the cost of deprecation. Perhaps something to push to 2.0? On Sun, Jan 24, 2016 at 8:17 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On So, 2016-01-24 at 13:00 +1100, Juan Nunez-Iglesias wrote:
I've used as_strided before to create an "endless" output array when I didn't care about the result of an operation, just the side effect. See eg here. So I would certainly like option to remain to get a writeable array. In general, I'm sceptical about whether the benefits outweigh the costs.
Yeah, that is a real use case. I am not planning to remove the option, but it would be as a `readonly` keyword argument, which means you would need to make the code depend on the numpy version if you require a writable array [1]. This actually somewhat defeats the purpose of all of this, but `np.ndarray` can do this dummy thing for you I think, so you could get around that, but....
The purpose is that if you actually would use an as_strided array in your operation, the result is unpredictable (not just complicated). And while as_strided is IMO designed to be used by people who know what they are doing, I have a feeling it is being used quite a lot in general.
We did a similar thing for the new `broadcast_to`, though I think there we decided to skip the readonly until complains happen.
Actually there is one more thing I might do. And that is issue a UserWarning when new array quite likely points to invalid memory.
- Sebastian
[1] as_strided does not currently support arr.flags.writable = True for its result array.
On Sun, Jan 24, 2016 at 9:20 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Jan 23, 2016 at 1:25 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
I have just opened a PR, to make as_strided writeonly (as
default). The
I think you meant readonly :-)
reasoning for this change is that an `as_strided` array often have self overlapping memory. However, writing to an array where multiple elements have the identical memory address can be confusing, and the results are typically unpredictable.
Considering the danger, the proposal is to add a `readonly=True`. A poweruser (who that function is designed for anyway), could thus still get a writeable array.
For the moment, writing to the result would raise a FutureWarning with `readonly="warn"`.
This should just be a deprecation warning, right? (Because switching an array from writeable->readonly might cause previously correct code to error out, but not to silently start returning different results.)
Do you agree with this, or would it be a major inconvenience?
AFAIK the only use cases for as_strided involve self-overlap (for non-self-overlap you can generally use reshape / indexing / etc. and it's much simpler). So +1 from me.
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Mon Jan 25 01:46:55 2016 GMT+0100, Juan Nunez-Iglesias wrote:
Yeah, that is a real use case. I am not planning to remove the option, but it would be as a `readonly` keyword argument, which means you would need to make the code depend on the numpy version if you require a writable array [1].
[1] as_strided does not currently support arr.flags.writable = True for its result array.
Can you explain this in more detail? I'm writing to the result without problem. Anyway, it only occurred to me after your response that the deprecation path would be quite disruptive to downstream libraries, including scikit-image. My feeling is that unless someone has been bitten by this (?), the benefit does not outweigh the cost of deprecation. Perhaps something to push to 2.0?
Setting the writeable flag to true, when it is set false before does not work with the way as_strided works right now with a DummyArray. I guess it is because its base is somehow not an ndarray. It should be possible to try/except creating it with np.ndarray, then it would be possible in your usecase. I do not have experience of beeing badly bitten, myself. Would you think it is fine if setting the flag to true would work in your case?
On Sun, Jan 24, 2016 at 8:17 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On So, 2016-01-24 at 13:00 +1100, Juan Nunez-Iglesias wrote:
I've used as_strided before to create an "endless" output array when I didn't care about the result of an operation, just the side effect. See eg here. So I would certainly like option to remain to get a writeable array. In general, I'm sceptical about whether the benefits outweigh the costs.
Yeah, that is a real use case. I am not planning to remove the option, but it would be as a `readonly` keyword argument, which means you would need to make the code depend on the numpy version if you require a writable array [1]. This actually somewhat defeats the purpose of all of this, but `np.ndarray` can do this dummy thing for you I think, so you could get around that, but....
The purpose is that if you actually would use an as_strided array in your operation, the result is unpredictable (not just complicated). And while as_strided is IMO designed to be used by people who know what they are doing, I have a feeling it is being used quite a lot in general.
We did a similar thing for the new `broadcast_to`, though I think there we decided to skip the readonly until complains happen.
Actually there is one more thing I might do. And that is issue a UserWarning when new array quite likely points to invalid memory.
- Sebastian
[1] as_strided does not currently support arr.flags.writable = True for its result array.
On Sun, Jan 24, 2016 at 9:20 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Jan 23, 2016 at 1:25 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,
I have just opened a PR, to make as_strided writeonly (as
default). The
I think you meant readonly :-)
reasoning for this change is that an `as_strided` array often have self overlapping memory. However, writing to an array where multiple elements have the identical memory address can be confusing, and the results are typically unpredictable.
Considering the danger, the proposal is to add a `readonly=True`. A poweruser (who that function is designed for anyway), could thus still get a writeable array.
For the moment, writing to the result would raise a FutureWarning with `readonly="warn"`.
This should just be a deprecation warning, right? (Because switching an array from writeable->readonly might cause previously correct code to error out, but not to silently start returning different results.)
Do you agree with this, or would it be a major inconvenience?
AFAIK the only use cases for as_strided involve self-overlap (for non-self-overlap you can generally use reshape / indexing / etc. and it's much simpler). So +1 from me.
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sun, Jan 24, 2016 at 4:46 PM, Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
Yeah, that is a real use case. I am not planning to remove the option, but it would be as a `readonly` keyword argument, which means you would need to make the code depend on the numpy version if you require a writable array [1].
[1] as_strided does not currently support arr.flags.writable = True for its result array.
Can you explain this in more detail? I'm writing to the result without problem.
I think what Sebastian's saying is: - right now it returns an array with the writeable flag set to True - it would make more sense semantically (in the general "try to nudge people towards not shooting themselves in the foot" sense) if by default it returned an array with the writeable flag set to False - of course it should still be possible to get an array with writeable=True by explicitly requesting it (e.g. with as_strided(..., writeable=True)). - The transition period then would be that it continues to default to writeable=True, but if you don't specify writeable=True, and then you write to the array, it works but it issues a DeprecationWarning telling you that you ought to update your code to explicitly request writeable=True at some point over the next ~year. - Often, if you have a readonly ndarray object, you can actually switch it to being writeable by executing `a.flags.writeable = True`. This is a bit hacky, and there are various safety conditions that have to hold, but often it works. Except it turns out that if as_strided is modified to return a readonly array, then some details of those safety checks mean that this trick *doesn't* work -- if it was created readonly, then it will stay that way. This doesn't really matter, though -- it just means that you have to use something like a writeable=True kwarg to ensure that it gets created writeable in the first place, instead of creating it readonly and then fixing it up after the fact. But it does rule this out as a possible alternative API. Oh well, it'd be pretty klunky anyway. -n -- Nathaniel J. Smith -- https://vorpus.org

On 23/01/16 22:25, Sebastian Berg wrote:
Do you agree with this, or would it be a major inconvenience?
I think any user of as_strided should be considered a power user. This is an inherently dangerous function, that can easily segfault the process. Anyone who uses as_strided should be assumed to have taken all precautions. -1 Sturla

On Mo, 2016-01-25 at 16:11 +0100, Sturla Molden wrote:
On 23/01/16 22:25, Sebastian Berg wrote:
Do you agree with this, or would it be a major inconvenience?
I think any user of as_strided should be considered a power user. This is an inherently dangerous function, that can easily segfault the process. Anyone who uses as_strided should be assumed to have taken all precautions.
I am ready to accept this notion (and I guess just put repeated warnings in the doc string). However, two things about it, first my impression is that for a lot of "not really power users" this function sometimes seems like a big hammer to solve all their problems. Second, I think even power users have sometimes wrong ideas about numpy's ufuncs and memory overlap. This starts with operations such as `arr[1:] += arr[:-1]` (see [1]) (for which there is at least a start on fixing it), and only gets worse with self-overlapping arrays. That said, I guess I could agree with you in the regard that there are so many *other* awful ways to use as_strided, that maybe it really is just so bad, that improving one thing doesn't actually help anyway ;). I was actually considering adding a UserWarning when it is likely that invalid memory is being addressed. I still dislike that it returns something writable *as default* though. Unless you write a single element, writing to such an array will be in many cases unpredictable, no matter how power user you are. - Sebastian [1] WARNING: This is a dangerous example, we ideally want it to be identical to arr[1:] += arr[:-1].copy() always: In [7]: arr = np.arange(10).reshape(5, 2) In [8]: arr[1:] += arr[:-1] In [9]: arr Out[9]: array([[ 0, 1], [ 2, 4], [ 6, 9], [12, 16], [20, 25]]) In [10]: arr = np.arange(10)[::-1].copy()[::-1].reshape(5, 2) In [11]: arr[1:] += arr[:-1] # happens to be "correct" In [12]: arr Out[12]: array([[ 0, 1], [ 2, 4], [ 6, 8], [10, 12], [14, 16]])
-1
Sturla
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On 25/01/16 18:06, Sebastian Berg wrote:
That said, I guess I could agree with you in the regard that there are so many *other* awful ways to use as_strided, that maybe it really is just so bad, that improving one thing doesn't actually help anyway ;).
That is roughly my position on this, yes. :) Sturla

I agree that it's not ideal that the return value of as_strided is writable. However, to be clear, this *would* break the API, which should not happen between minor releases when using semantic versioning. Even with a deprecation cycle, for libraries such as scikit-image that want to maintain broad compatibility with multiple numpy versions, we would then have to have some code to detect which version of numpy we're dealing with, and do something different depending on the version. That's a big development cost for something that has not been shown to cause any problems. And btw, although some people might use as_strided that aren't super-amazing level 42 programmers, I would say that by that stage they are probably comfortable enough to troubleshoot the shitstorm that's about to fall on them. =P On Tue, Jan 26, 2016 at 4:25 AM, Sturla Molden <sturla.molden@gmail.com> wrote:
On 25/01/16 18:06, Sebastian Berg wrote:
That said, I guess I could agree with you in the regard that there are
so many *other* awful ways to use as_strided, that maybe it really is just so bad, that improving one thing doesn't actually help anyway ;).
That is roughly my position on this, yes. :)
Sturla
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (4)
-
Juan Nunez-Iglesias
-
Nathaniel Smith
-
Sebastian Berg
-
Sturla Molden