[Numpy-discussion] Add sliding_window_view method to numpy

Zimmermann Klaus klaus.zimmermann at smhi.se
Fri Nov 6 11:03:00 EST 2020


Hi,

On 06/11/2020 15:58, Ralf Gommers wrote:
> On Fri, Nov 6, 2020 at 9:51 AM Zimmermann Klaus
> <klaus.zimmermann at smhi.se <mailto:klaus.zimmermann at smhi.se>> wrote:
>     I have absolutely no problem keeping this out of the main namespace.
> 
>     In fact I'd like to point out that it was not my idea. Rather, it was
>     proposed by Bas van Beek in the comments [1,2] and received a little
>     more scrutiny from Eric Wieser in [3].
> 
> Thanks, between two PRs with that many comments, I couldn't figure that
> out - just saw the commit that make the change.

Understandable, no worries.

>     On the subject matter, I am also curious about the potential for
>     confusion. What other behavior could one expect from a sliding window
>     view with this shape?
> 
>     As I said, I am completely fine with keeping this out of the main
>     namespace, but I agree with Sebastian's comment, that
>     `np.lib.stride_tricks` is perhaps not the best namespace. 
> 
> 
> I agree that that's not a great namespace. There's multiple issues with
> namespaces, we basically have three good ones (fft, linalg, random) and
> a bunch of other ones that range from questionable to terrible. See
> https://github.com/numpy/numpy/blob/master/numpy/tests/test_public_api.py#L127
> <https://github.com/numpy/numpy/blob/master/numpy/tests/test_public_api.py#L127>
> for details.
> 
> This would be a good thing to work on - making the `numpy.lib` namespace
> not bleed into `numpy` via `import *` is one thing to do there, and
> there's many others. But given backwards compat constraints it's not easy.

I understand cleaning up all the namespaces is a giant task, so far, far
out of scope here. As said before, I also completely agree to keep it
out of the main namespace (though I will still argue below :P).

I was just wondering if, of the top your head, an existing, better fit
comes to mind?

>     The reason
>     from my point of view is that stride tricks is really a technical (and
>     slightly ominous) name that might throw of more application oriented
>     programmers from finding and using this function. Thinking of my
>     scientist colleagues, I think those are exactly the kind of users that
>     could benefit from such a prototyping tool.
> 
> 
> That phrasing is one of a number of concerns. NumPy is normally not in
> the business of providing things that are okay as a prototyping tool,
> but are potentially extremely slow (as pointed out in the Notes section
> of the docstring). A function like that would basically not be the right
> tool for almost anything in, e.g., SciPy - it requires an iterative
> algorithm. In NumPy we don't prefer performance at all costs, but in
> general it's pretty decent rather than "Numba or Cython may gain you
> 100x here".

I still think that the performance concern is a bit overblown. Yes,
application with large windows can need more FLOPs by an equally large
factor. But most such applications will use small to moderate windows.
Furthermore, this view focuses only on FLOPs. In my current field of
climate science (and many others), that is almost never the limiting
factor. Memory demands are far more problematic and incidentally, those
are more likely to increase in other methods that require the storage of
ancillary, temporary data.

> Other issues include:
> 2) It is very specific to NumPy's memory model (as pointed out by you
> and Sebastian) - just like the rest of stride_tricks
Not wrong, but on the other hand, that memory model is not exotic. C,
Fortran, and any number of other languages play very nicely with this,
just as important downstream libraries like dask.

> 3) It has "view" in the name, which doesn't quite make sense for the
> main namespace (also connected to point 2 above).
Ok.

> 4) The cost of putting something in the main namespace for other
> array/tensor libraries is large. Maybe other libraries, e.g. CuPy, Dask,
> TensorFlow, PyTorch, JAX, MXNet, aim to reimplement part or all of the
> main NumPy namespace as well as possible. This would trigger discussions
> and likely many person-weeks of work for others.
Agreed. Though I have to say that my whole motivation comes from
corresponding issues in dask that where specifically waiting for (the
older version of) this PR (see [1, 2,...]). But I understand that dask
is effectively much closer to the numpy memory model than, say, CuPy, so
don't take this to mean it should be in the main namespace.

> 5) It's a useful function, but it's very much on the margins of NumPy's
> scope. It could easily have gone into, for example, scipy.signal. At
> this point the bar for functions going into the main namespace should be> (and is) high.
I agree that the bar for the main namespace should be high!

> All this taken together means it's not even a toss-up for me. If it were
> just one or two of these points, maybe. But given all the above, I'm
> pretty confident saying "it does not belong in the main namespace".
Again, I am happy with that.


Thanks for your thoughts and work! I really appreciate it!

Cheers
Klaus

[1] https://github.com/dask/dask/issues/4659
[2] https://github.com/pydata/xarray/issues/3608
[3] https://github.com/pandas-dev/pandas/issues/26959


> 
> 
>     Cheers
>     Klaus
> 
> 
> 
>     [1] https://github.com/numpy/numpy/pull/17394#issuecomment-700998618
>     <https://github.com/numpy/numpy/pull/17394#issuecomment-700998618>
>     [2] https://github.com/numpy/numpy/pull/17394#discussion_r498215468
>     <https://github.com/numpy/numpy/pull/17394#discussion_r498215468>
>     [3] https://github.com/numpy/numpy/pull/17394#discussion_r498724340
>     <https://github.com/numpy/numpy/pull/17394#discussion_r498724340>
> 
>     On 06/11/2020 01:39, Sebastian Berg wrote:
>     > On Thu, 2020-11-05 at 17:35 -0600, Sebastian Berg wrote:
>     >> On Thu, 2020-11-05 at 12:51 -0800, Stephan Hoyer wrote:
>     >>> On Thu, Nov 5, 2020 at 11:16 AM Ralf Gommers <
>     >>> ralf.gommers at gmail.com <mailto:ralf.gommers at gmail.com>>
>     >>> wrote:
>     >>>
>     >>>> On Thu, Nov 5, 2020 at 4:56 PM Sebastian Berg <
>     >>>> sebastian at sipsolutions.net <mailto:sebastian at sipsolutions.net>>
>     >>>> wrote:
>     >>>>
>     >>>>> Hi all,
>     >>>>>
>     >>>>> just a brief note that I merged this proposal:
>     >>>>>
>     >>>>>     https://github.com/numpy/numpy/pull/17394
>     <https://github.com/numpy/numpy/pull/17394>
>     >>>>>
>     >>>>> adding `np.sliding_window_view` into the 1.20 release of NumPy.
>     >>>>>
>     >>>>> There was only one public API change, and that is that the
>     >>>>> `shape`
>     >>>>> argument is now called `window_shape`.
>     >>>>>
>     >>>>> This is still a good time for feedback in case you have a
>     >>>>> better
>     >>>>> idea
>     >>>>> e.g. for the function or parameter names.
>     >>>>>
>     >>>>
>     >>>> The old PR had this in the lib.stride_tricks namespace. Seeing it
>     >>>> in the
>     >>>> main namespace is unexpected and likely will lead to
>     >>>> issues/questions,
>     >>>> given that such an overlapping view is going to do behave in ways
>     >>>> the
>     >>>> average user will be surprised by. It may also lead to requests
>     >>>> for
>     >>>> other
>     >>>> array/tensor libraries to implement this. I don't see any
>     >>>> discussion on
>     >>>> this in PR 17394, it looks like a decision by the PR author that
>     >>>> no
>     >>>> one
>     >>>> commented on - reconsider that?
>     >>>>
>     >>>> Cheers,
>     >>>> Ralf
>     >>>>
>     >>>
>     >>> +1 let's keep this in the lib.stride_tricks namespace.
>     >>>
>     >>
>     >> I have no reservations against having it in the main namespace and am
>     >> happy either way (it can still be exposed later in any case). It is
>     >> the
>     >> conservative choice and maybe it is an uncommon enough function that
>     >> it
>     >> deserves being a bit hidden...
>     >
>     >
>     > In any case, its the safe bet for NumPy 1.20 at least so I opened
>     a PR:
>     >
>     >     https://github.com/numpy/numpy/pull/17720
>     <https://github.com/numpy/numpy/pull/17720>
>     >
>     > Name changes, etc. are also possible of course.
>     >
>     > I still think it might be nice to find a better place for this type of
>     > function that `np.lib.stride_tricks` though, but dunno...
>     >
>     > - Sebastian
>     >
>     >
>     >
>     >>
>     >> But I am curious, it sounds like you have both very strong
>     >> reservations, and I would like to understand them better.
>     >>
>     >> The behaviour can be surprising, but that is why the default is a
>     >> read-
>     >> only view.  I do not think it is worse than `np.broadcast_to` in this
>     >> regard. (It is nowhere near as dangerous as `as_strided`.)
>     >>
>     >> It is true that it is specific to NumPy (memory model). So that is
>     >> maybe a good enough reason right now.  But I am not sure that
>     >> stuffing
>     >> things into a pretty hidden `np.lib.*` namespaces is a great long
>     >> term
>     >> solution either. There is very little useful functionality hidden
>     >> away
>     >> in `np.lib.*` currently.
>     >>
>     >> Cheers,
>     >>
>     >> Sebastian
>     >>
>     >>>>
>     >>>>
>     >>>>> Cheers,
>     >>>>>
>     >>>>> Sebastian
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> On Mon, 2020-10-12 at 08:39 +0000, Zimmermann Klaus wrote:
>     >>>>>> Hello,
>     >>>>>>
>     >>>>>> I would like to draw the attention of this list to PR #17394
>     >>>>>> [1] that
>     >>>>>> adds the implementation of a sliding window view to numpy.
>     >>>>>>
>     >>>>>> Having a sliding window view in numpy is a longstanding open
>     >>>>>> issue
>     >>>>>> (cf
>     >>>>>> #7753 [2] from 2016). A brief summary of the discussions
>     >>>>>> surrounding
>     >>>>>> it
>     >>>>>> can be found in the description of the PR.
>     >>>>>>
>     >>>>>> This PR implements a sliding window view based on stride
>     >>>>>> tricks.
>     >>>>>> Following the discussion in issue #7753, a first
>     >>>>>> implementation
>     >>>>>> was
>     >>>>>> provided by Fanjin Zeng in PR #10771. After some discussion,
>     >>>>>> that PR
>     >>>>>> stalled and I picked up the issue in the present PR #17394.
>     >>>>>> It
>     >>>>>> is
>     >>>>>> based
>     >>>>>> on the first implementation, but follows the changed API as
>     >>>>>> suggested
>     >>>>>> by
>     >>>>>> Eric Wieser.
>     >>>>>>
>     >>>>>> Code reviews have been provided by Bas van Beek, Stephen
>     >>>>>> Hoyer,
>     >>>>>> and
>     >>>>>> Eric
>     >>>>>> Wieser. Sebastian Berg added the "62 - Python API" label.
>     >>>>>>
>     >>>>>>
>     >>>>>> Do you think this is suitable for inclusion in numpy?
>     >>>>>>
>     >>>>>> Do you consider the PR ready?
>     >>>>>>
>     >>>>>> Do you have suggestions or requests?
>     >>>>>>
>     >>>>>>
>     >>>>>> Thanks for your time and consideration!
>     >>>>>> Klaus
>     >>>>>>
>     >>>>>>
>     >>>>>> [1] https://github.com/numpy/numpy/pull/17394
>     <https://github.com/numpy/numpy/pull/17394>
>     >>>>>> [2] https://github.com/numpy/numpy/issues/7753
>     <https://github.com/numpy/numpy/issues/7753>
>     >>>>>> _______________________________________________
>     >>>>>> NumPy-Discussion mailing list
>     >>>>>> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     >>>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >>>>>>
>     >>>>>
>     >>>>> _______________________________________________
>     >>>>> NumPy-Discussion mailing list
>     >>>>> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     >>>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >>>>>
>     >>>> _______________________________________________
>     >>>> NumPy-Discussion mailing list
>     >>>> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     >>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >>>>
>     >>>
>     >>> _______________________________________________
>     >>> NumPy-Discussion mailing list
>     >>> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     >>> https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >>
>     >> _______________________________________________
>     >> NumPy-Discussion mailing list
>     >> NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     >> https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >
>     >
>     > _______________________________________________
>     > NumPy-Discussion mailing list
>     > NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     > https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
>     >
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 


More information about the NumPy-Discussion mailing list