Add sliding_window_view method to numpy
![](https://secure.gravatar.com/avatar/d9aabd177a8ede98fdc1e23f477f8a4f.jpg?s=120&d=mm&r=g)
Hello, I would like to draw the attention of this list to PR #17394 [1] that adds the implementation of a sliding window view to numpy. Having a sliding window view in numpy is a longstanding open issue (cf #7753 [2] from 2016). A brief summary of the discussions surrounding it can be found in the description of the PR. This PR implements a sliding window view based on stride tricks. Following the discussion in issue #7753, a first implementation was provided by Fanjin Zeng in PR #10771. After some discussion, that PR stalled and I picked up the issue in the present PR #17394. It is based on the first implementation, but follows the changed API as suggested by Eric Wieser. Code reviews have been provided by Bas van Beek, Stephen Hoyer, and Eric Wieser. Sebastian Berg added the "62 - Python API" label. Do you think this is suitable for inclusion in numpy? Do you consider the PR ready? Do you have suggestions or requests? Thanks for your time and consideration! Klaus [1] https://github.com/numpy/numpy/pull/17394 [2] https://github.com/numpy/numpy/issues/7753
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mon, 2020-10-12 at 08:39 +0000, Zimmermann Klaus wrote:
Hi, thanks for working on this and driving going forward. I like the choice of a minimal API. I have pasted the doc-string (html, hope that works fine) below to allow a quicker idea of what is being proposed. To me it looks good! (I wonder if we need `subok`, but I guess we probably do.) Cheers, Sebastian numpy.sliding_window_viewnumpy.sliding_window_view(x, window_shape, axi s=None, *, subok=False, writeable=False)Create a sliding window view into the array with the given window shape. Creates a sliding window view of the N dimensional array with the given window shape. Window slides across each dimension of the array and extract a subsets of the array at any window position. Parametersx : array_likeArray to create the sliding window view from.window_shape : int or tuple of intSize of window over each axis that takes part in the sliding window. If axis is not present, must have same length as the number of input array dimensions. Single integers i are treated as if they were the tuple (i,).axis : int or tuple of int, optionalAxis or axes along which the sliding window is applied. By default, the sliding window is applied to all axes and window_shape[i] will refer to axis i of x. If axis is given as a tuple of int, window_shape[i] will refer to the axis axis[i] of x. Single integers i are treated as if they were the tuple (i,).subok : bool, optionalIf True, sub-classes will be passed- through, otherwise the returned array will be forced to be a base-class array (default).writeable : bool, optionalWhen true, allow writing to the returned view. The default is false, as this should be used with caution: the returned view contains the same memory location multiple times, so writing to one location will cause others to change.Returnsview : ndarraySliding window view of the array. The sliding window dimensions are inserted at the end, and the original dimensions are trimmed as required by the size of the sliding window. That is, view.shape = x_shape_trimmed + window_shape, where x_shape_trimmed is x.shape with every entry reduced by one less than the corresponding window size.See also lib.stride_tricks.as_stridedCreate a view into the array with the given shape and strides.broadcast_tobroadcast an array to a given shape. Notes For some cases there may be more efficient approaches to calculate transformations across multi-dimensional arrays, for instance scipy.signal.fftconvolve, where combining the iterating step with the calculation itself while storing partial results can result in significant speedups. Examples
This also works in more dimensions, e.g.
The axis can be specified explicitly:
The same axis can be used several times. In that case, every use reduces the corresponding original dimension:
Combining with stepped slicing (::step), this can be used to take sliding views which skip elements:
or views which move by multiple elements
![](https://secure.gravatar.com/avatar/d9aabd177a8ede98fdc1e23f477f8a4f.jpg?s=120&d=mm&r=g)
Hi, thanks, Sebastian! Since `sliding_window_view` is essentially a small function determining the parameters with which to call `as_strided`, it makes sense to me to keep the essential `as_strided` parameters, `subok` among them. But if somebody with a deeper insight into numpy is convinced this should go, I have no problem with it. Cheers Klaus On 12/10/2020 17:25, Sebastian Berg wrote:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
Hi all, just a brief note that I merged this proposal: https://github.com/numpy/numpy/pull/17394 adding `np.sliding_window_view` into the 1.20 release of NumPy. There was only one public API change, and that is that the `shape` argument is now called `window_shape`. This is still a good time for feedback in case you have a better idea e.g. for the function or parameter names. Cheers, Sebastian On Mon, 2020-10-12 at 08:39 +0000, Zimmermann Klaus wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Thu, Nov 5, 2020 at 4:56 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
The old PR had this in the lib.stride_tricks namespace. Seeing it in the main namespace is unexpected and likely will lead to issues/questions, given that such an overlapping view is going to do behave in ways the average user will be surprised by. It may also lead to requests for other array/tensor libraries to implement this. I don't see any discussion on this in PR 17394, it looks like a decision by the PR author that no one commented on - reconsider that? Cheers, Ralf
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Thu, 2020-11-05 at 12:51 -0800, Stephan Hoyer wrote:
I have no reservations against having it in the main namespace and am happy either way (it can still be exposed later in any case). It is the conservative choice and maybe it is an uncommon enough function that it deserves being a bit hidden... But I am curious, it sounds like you have both very strong reservations, and I would like to understand them better. The behaviour can be surprising, but that is why the default is a read- only view. I do not think it is worse than `np.broadcast_to` in this regard. (It is nowhere near as dangerous as `as_strided`.) It is true that it is specific to NumPy (memory model). So that is maybe a good enough reason right now. But I am not sure that stuffing things into a pretty hidden `np.lib.*` namespaces is a great long term solution either. There is very little useful functionality hidden away in `np.lib.*` currently. Cheers, Sebastian
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Thu, 2020-11-05 at 17:35 -0600, Sebastian Berg wrote:
In any case, its the safe bet for NumPy 1.20 at least so I opened a PR: https://github.com/numpy/numpy/pull/17720 Name changes, etc. are also possible of course. I still think it might be nice to find a better place for this type of function that `np.lib.stride_tricks` though, but dunno... - Sebastian
![](https://secure.gravatar.com/avatar/d9aabd177a8ede98fdc1e23f477f8a4f.jpg?s=120&d=mm&r=g)
Hi all, I have absolutely no problem keeping this out of the main namespace. In fact I'd like to point out that it was not my idea. Rather, it was proposed by Bas van Beek in the comments [1,2] and received a little more scrutiny from Eric Wieser in [3]. The reason that it didn't receive the scrutiny it probably deserves is that it got a bit mangled up with the array dispatch discussion; sorry for that. On the subject matter, I am also curious about the potential for confusion. What other behavior could one expect from a sliding window view with this shape? As I said, I am completely fine with keeping this out of the main namespace, but I agree with Sebastian's comment, that `np.lib.stride_tricks` is perhaps not the best namespace. The reason from my point of view is that stride tricks is really a technical (and slightly ominous) name that might throw of more application oriented programmers from finding and using this function. Thinking of my scientist colleagues, I think those are exactly the kind of users that could benefit from such a prototyping tool. Cheers Klaus [1] https://github.com/numpy/numpy/pull/17394#issuecomment-700998618 [2] https://github.com/numpy/numpy/pull/17394#discussion_r498215468 [3] https://github.com/numpy/numpy/pull/17394#discussion_r498724340 On 06/11/2020 01:39, Sebastian Berg wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Fri, Nov 6, 2020 at 9:51 AM Zimmermann Klaus <klaus.zimmermann@smhi.se> wrote:
Thanks, between two PRs with that many comments, I couldn't figure that out - just saw the commit that make the change.
No worries at all. This is why we announce new features on the mailing list.
I agree that that's not a great namespace. There's multiple issues with namespaces, we basically have three good ones (fft, linalg, random) and a bunch of other ones that range from questionable to terrible. See https://github.com/numpy/numpy/blob/master/numpy/tests/test_public_api.py#L1... for details. This would be a good thing to work on - making the `numpy.lib` namespace not bleed into `numpy` via `import *` is one thing to do there, and there's many others. But given backwards compat constraints it's not easy.
That phrasing is one of a number of concerns. NumPy is normally not in the business of providing things that are okay as a prototyping tool, but are potentially extremely slow (as pointed out in the Notes section of the docstring). A function like that would basically not be the right tool for almost anything in, e.g., SciPy - it requires an iterative algorithm. In NumPy we don't prefer performance at all costs, but in general it's pretty decent rather than "Numba or Cython may gain you 100x here". Other issues include: 2) It is very specific to NumPy's memory model (as pointed out by you and Sebastian) - just like the rest of stride_tricks 3) It has "view" in the name, which doesn't quite make sense for the main namespace (also connected to point 2 above). 4) The cost of putting something in the main namespace for other array/tensor libraries is large. Maybe other libraries, e.g. CuPy, Dask, TensorFlow, PyTorch, JAX, MXNet, aim to reimplement part or all of the main NumPy namespace as well as possible. This would trigger discussions and likely many person-weeks of work for others. 5) It's a useful function, but it's very much on the margins of NumPy's scope. It could easily have gone into, for example, scipy.signal. At this point the bar for functions going into the main namespace should be (and is) high. All this taken together means it's not even a toss-up for me. If it were just one or two of these points, maybe. But given all the above, I'm pretty confident saying "it does not belong in the main namespace". Cheers, Ralf
![](https://secure.gravatar.com/avatar/d9aabd177a8ede98fdc1e23f477f8a4f.jpg?s=120&d=mm&r=g)
Hi, On 06/11/2020 15:58, Ralf Gommers wrote:
Understandable, no worries.
I understand cleaning up all the namespaces is a giant task, so far, far out of scope here. As said before, I also completely agree to keep it out of the main namespace (though I will still argue below :P). I was just wondering if, of the top your head, an existing, better fit comes to mind?
I still think that the performance concern is a bit overblown. Yes, application with large windows can need more FLOPs by an equally large factor. But most such applications will use small to moderate windows. Furthermore, this view focuses only on FLOPs. In my current field of climate science (and many others), that is almost never the limiting factor. Memory demands are far more problematic and incidentally, those are more likely to increase in other methods that require the storage of ancillary, temporary data.
Thanks for your thoughts and work! I really appreciate it! Cheers Klaus [1] https://github.com/dask/dask/issues/4659 [2] https://github.com/pydata/xarray/issues/3608 [3] https://github.com/pandas-dev/pandas/issues/26959
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mon, 2020-10-12 at 08:39 +0000, Zimmermann Klaus wrote:
Hi, thanks for working on this and driving going forward. I like the choice of a minimal API. I have pasted the doc-string (html, hope that works fine) below to allow a quicker idea of what is being proposed. To me it looks good! (I wonder if we need `subok`, but I guess we probably do.) Cheers, Sebastian numpy.sliding_window_viewnumpy.sliding_window_view(x, window_shape, axi s=None, *, subok=False, writeable=False)Create a sliding window view into the array with the given window shape. Creates a sliding window view of the N dimensional array with the given window shape. Window slides across each dimension of the array and extract a subsets of the array at any window position. Parametersx : array_likeArray to create the sliding window view from.window_shape : int or tuple of intSize of window over each axis that takes part in the sliding window. If axis is not present, must have same length as the number of input array dimensions. Single integers i are treated as if they were the tuple (i,).axis : int or tuple of int, optionalAxis or axes along which the sliding window is applied. By default, the sliding window is applied to all axes and window_shape[i] will refer to axis i of x. If axis is given as a tuple of int, window_shape[i] will refer to the axis axis[i] of x. Single integers i are treated as if they were the tuple (i,).subok : bool, optionalIf True, sub-classes will be passed- through, otherwise the returned array will be forced to be a base-class array (default).writeable : bool, optionalWhen true, allow writing to the returned view. The default is false, as this should be used with caution: the returned view contains the same memory location multiple times, so writing to one location will cause others to change.Returnsview : ndarraySliding window view of the array. The sliding window dimensions are inserted at the end, and the original dimensions are trimmed as required by the size of the sliding window. That is, view.shape = x_shape_trimmed + window_shape, where x_shape_trimmed is x.shape with every entry reduced by one less than the corresponding window size.See also lib.stride_tricks.as_stridedCreate a view into the array with the given shape and strides.broadcast_tobroadcast an array to a given shape. Notes For some cases there may be more efficient approaches to calculate transformations across multi-dimensional arrays, for instance scipy.signal.fftconvolve, where combining the iterating step with the calculation itself while storing partial results can result in significant speedups. Examples
This also works in more dimensions, e.g.
The axis can be specified explicitly:
The same axis can be used several times. In that case, every use reduces the corresponding original dimension:
Combining with stepped slicing (::step), this can be used to take sliding views which skip elements:
or views which move by multiple elements
![](https://secure.gravatar.com/avatar/d9aabd177a8ede98fdc1e23f477f8a4f.jpg?s=120&d=mm&r=g)
Hi, thanks, Sebastian! Since `sliding_window_view` is essentially a small function determining the parameters with which to call `as_strided`, it makes sense to me to keep the essential `as_strided` parameters, `subok` among them. But if somebody with a deeper insight into numpy is convinced this should go, I have no problem with it. Cheers Klaus On 12/10/2020 17:25, Sebastian Berg wrote:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
Hi all, just a brief note that I merged this proposal: https://github.com/numpy/numpy/pull/17394 adding `np.sliding_window_view` into the 1.20 release of NumPy. There was only one public API change, and that is that the `shape` argument is now called `window_shape`. This is still a good time for feedback in case you have a better idea e.g. for the function or parameter names. Cheers, Sebastian On Mon, 2020-10-12 at 08:39 +0000, Zimmermann Klaus wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Thu, Nov 5, 2020 at 4:56 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
The old PR had this in the lib.stride_tricks namespace. Seeing it in the main namespace is unexpected and likely will lead to issues/questions, given that such an overlapping view is going to do behave in ways the average user will be surprised by. It may also lead to requests for other array/tensor libraries to implement this. I don't see any discussion on this in PR 17394, it looks like a decision by the PR author that no one commented on - reconsider that? Cheers, Ralf
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Thu, 2020-11-05 at 12:51 -0800, Stephan Hoyer wrote:
I have no reservations against having it in the main namespace and am happy either way (it can still be exposed later in any case). It is the conservative choice and maybe it is an uncommon enough function that it deserves being a bit hidden... But I am curious, it sounds like you have both very strong reservations, and I would like to understand them better. The behaviour can be surprising, but that is why the default is a read- only view. I do not think it is worse than `np.broadcast_to` in this regard. (It is nowhere near as dangerous as `as_strided`.) It is true that it is specific to NumPy (memory model). So that is maybe a good enough reason right now. But I am not sure that stuffing things into a pretty hidden `np.lib.*` namespaces is a great long term solution either. There is very little useful functionality hidden away in `np.lib.*` currently. Cheers, Sebastian
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Thu, 2020-11-05 at 17:35 -0600, Sebastian Berg wrote:
In any case, its the safe bet for NumPy 1.20 at least so I opened a PR: https://github.com/numpy/numpy/pull/17720 Name changes, etc. are also possible of course. I still think it might be nice to find a better place for this type of function that `np.lib.stride_tricks` though, but dunno... - Sebastian
![](https://secure.gravatar.com/avatar/d9aabd177a8ede98fdc1e23f477f8a4f.jpg?s=120&d=mm&r=g)
Hi all, I have absolutely no problem keeping this out of the main namespace. In fact I'd like to point out that it was not my idea. Rather, it was proposed by Bas van Beek in the comments [1,2] and received a little more scrutiny from Eric Wieser in [3]. The reason that it didn't receive the scrutiny it probably deserves is that it got a bit mangled up with the array dispatch discussion; sorry for that. On the subject matter, I am also curious about the potential for confusion. What other behavior could one expect from a sliding window view with this shape? As I said, I am completely fine with keeping this out of the main namespace, but I agree with Sebastian's comment, that `np.lib.stride_tricks` is perhaps not the best namespace. The reason from my point of view is that stride tricks is really a technical (and slightly ominous) name that might throw of more application oriented programmers from finding and using this function. Thinking of my scientist colleagues, I think those are exactly the kind of users that could benefit from such a prototyping tool. Cheers Klaus [1] https://github.com/numpy/numpy/pull/17394#issuecomment-700998618 [2] https://github.com/numpy/numpy/pull/17394#discussion_r498215468 [3] https://github.com/numpy/numpy/pull/17394#discussion_r498724340 On 06/11/2020 01:39, Sebastian Berg wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Fri, Nov 6, 2020 at 9:51 AM Zimmermann Klaus <klaus.zimmermann@smhi.se> wrote:
Thanks, between two PRs with that many comments, I couldn't figure that out - just saw the commit that make the change.
No worries at all. This is why we announce new features on the mailing list.
I agree that that's not a great namespace. There's multiple issues with namespaces, we basically have three good ones (fft, linalg, random) and a bunch of other ones that range from questionable to terrible. See https://github.com/numpy/numpy/blob/master/numpy/tests/test_public_api.py#L1... for details. This would be a good thing to work on - making the `numpy.lib` namespace not bleed into `numpy` via `import *` is one thing to do there, and there's many others. But given backwards compat constraints it's not easy.
That phrasing is one of a number of concerns. NumPy is normally not in the business of providing things that are okay as a prototyping tool, but are potentially extremely slow (as pointed out in the Notes section of the docstring). A function like that would basically not be the right tool for almost anything in, e.g., SciPy - it requires an iterative algorithm. In NumPy we don't prefer performance at all costs, but in general it's pretty decent rather than "Numba or Cython may gain you 100x here". Other issues include: 2) It is very specific to NumPy's memory model (as pointed out by you and Sebastian) - just like the rest of stride_tricks 3) It has "view" in the name, which doesn't quite make sense for the main namespace (also connected to point 2 above). 4) The cost of putting something in the main namespace for other array/tensor libraries is large. Maybe other libraries, e.g. CuPy, Dask, TensorFlow, PyTorch, JAX, MXNet, aim to reimplement part or all of the main NumPy namespace as well as possible. This would trigger discussions and likely many person-weeks of work for others. 5) It's a useful function, but it's very much on the margins of NumPy's scope. It could easily have gone into, for example, scipy.signal. At this point the bar for functions going into the main namespace should be (and is) high. All this taken together means it's not even a toss-up for me. If it were just one or two of these points, maybe. But given all the above, I'm pretty confident saying "it does not belong in the main namespace". Cheers, Ralf
![](https://secure.gravatar.com/avatar/d9aabd177a8ede98fdc1e23f477f8a4f.jpg?s=120&d=mm&r=g)
Hi, On 06/11/2020 15:58, Ralf Gommers wrote:
Understandable, no worries.
I understand cleaning up all the namespaces is a giant task, so far, far out of scope here. As said before, I also completely agree to keep it out of the main namespace (though I will still argue below :P). I was just wondering if, of the top your head, an existing, better fit comes to mind?
I still think that the performance concern is a bit overblown. Yes, application with large windows can need more FLOPs by an equally large factor. But most such applications will use small to moderate windows. Furthermore, this view focuses only on FLOPs. In my current field of climate science (and many others), that is almost never the limiting factor. Memory demands are far more problematic and incidentally, those are more likely to increase in other methods that require the storage of ancillary, temporary data.
Thanks for your thoughts and work! I really appreciate it! Cheers Klaus [1] https://github.com/dask/dask/issues/4659 [2] https://github.com/pydata/xarray/issues/3608 [3] https://github.com/pandas-dev/pandas/issues/26959
participants (5)
-
Juan Nunez-Iglesias
-
Ralf Gommers
-
Sebastian Berg
-
Stephan Hoyer
-
Zimmermann Klaus