ENH: Extend peak finding capabilities in scipy.signal (#8264)

Dear SciPy devs, I'd like to highlight my current pull request which extends SciPy's peak finding capabilities in the signal module. https://github.com/scipy/scipy/pull/8264 I've tried to address all feedback that was given (until now) in said PR. However there are still some areas that would profit from a code review and an additional pair of eyes. Furthermore there are still some outstanding issues. It may be best to discuss them here: - There are still "Changes requested" which I have addressed or are outdated. I don't know how to go about removing this flag. - I think the Travis CI build fail is unrelated to my PR (something in the stats module). Perhaps someone with more knowledge should confirm this. - How to handle if invalid peaks are passed to the functions peak_prominences and peak_widths? Is it okay to describe this as undefined behavior in the documentation? - argrelmax doesn't catch peaks that are wider than one sample. Decide how to deal with this. I have implemented an alternative here which may even be faster. But I plan to make a performance comparison to confirm this. I have a feeling this should be addressed (if wanted) in a new pull request. - Should the new functionality be covered in the tutorial for scipy.signal? I feel like that would be the best place to show more complex examples which are out of place in the docstrings itself. If so, I think a new PR would be the best place. - The decision whether find_peaks should have a postfix like "find_peaks_cwt" is still pending. If so which one? Again, thank you all in advance for taking the time to review this PR and any feedback given. Best regards, Lars

- There are still "Changes requested" which I have addressed or are outdated. I don't know how to go about removing this flag.
You can ping previous reviewers to take another look. They can then remove the tag. - I think the Travis CI build fail is unrelated to my PR (something in
the stats module). Perhaps someone with more knowledge should confirm this.
This unfortunately happens from time to time, especially if it's on the NumPy dev/master build. - How to handle if invalid peaks are passed to the functions
peak_prominences and peak_widths? Is it okay to describe this as undefined behavior in the documentation?
Yes I think so. Best to raise a sensible error if possible (e.g., argument shapes don't make sense) but I don't think it's worth spending too much time protecting against all failure modes. - argrelmax doesn't catch peaks that are wider than one sample. Decide
how to deal with this. I have implemented an alternative here which may even be faster. But I plan to make a performance comparison to confirm this. I have a feeling this should be addressed (if wanted) in a new pull request.
Another PR is fine, but we should merge that one before this find_peaks PR to avoid creating backward compatibility problems. - Should the new functionality be covered in the tutorial for
scipy.signal? I feel like that would be the best place to show more complex examples which are out of place in the docstrings itself. If so, I think a new PR would be the best place.
I think it makes sense to have this as part of the `find_peaks` PR. Having good / sufficient documentation of functions is worth holding off on merging. It is a good test for API sanity, and also makes it easier for others to review at a high level. - The decision whether find_peaks should have a postfix like
"find_peaks_cwt" is still pending. If so which one?
There is existing discussion of this in the PR, so if people have thoughts on this, it's worth looking there to see what ground has been covered first. Again, thank you all in advance for taking the time to review this PR
and any feedback given.
Thanks for working on this functionality! Cheers, Eric

To extend this:
- argrelmax doesn't catch peaks that are wider than one sample. Decide how to deal with this. I have implemented an alternative here which may even be faster. But I plan to make a performance comparison to confirm this. I have a feeling this should be addressed (if wanted) in a new pull request.
I forgot to include the link for the proposed alternative implementation: https://gitlab.com/lagru/peakfinder-demo/blob/master/peak_finder.py#L430
- The decision whether find_peaks should have a postfix like "find_peaks_cwt" is still pending. If so which one?
There is existing discussion of this in the PR, so if people have thoughts on this, it's worth looking there to see what ground has been covered first.
The new function find_peaks does essentially two things: 1. find all local maxima 2. return a subset of these maxima based on certain criteria (height, prominence, ...) The first step (currently uses argrelmax) could be solved in many different ways (e.g. my alternate solution or even with wavelet transformation). Therefore I'm not a big fan of adding a prefix based on the current implementation detail. Lars

On Fri, 26 Jan 2018 21:19:00 +0100, Lars G. wrote:
- argrelmax doesn't catch peaks that are wider than one sample. Decide how to deal with this. I have implemented an alternative here which may even be faster. But I plan to make a performance comparison to confirm this. I have a feeling this should be addressed (if wanted) in a new pull request.
I forgot to include the link for the proposed alternative implementation: https://gitlab.com/lagru/peakfinder-demo/blob/master/peak_finder.py#L430
For comparison, this is what we use in skimage: http://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.pe... Stéfan

Dear SciPy devs,
- argrelmax doesn't catch peaks that are wider than one sample. Decide how to deal with this. I have implemented an alternative here which may even be faster. But I plan to make a performance comparison to confirm this. I have a feeling this should be addressed (if wanted) in a new pull request.
On 26.01.2018 20:59, Eric Larson wrote:
Another PR is fine, but we should merge that one before this find_peaks PR to avoid creating backward compatibility problems.
I have prepared a performance evaluation for you to review here: http://nbviewer.jupyter.org/urls/gitlab.com/snippets/1695752/raw You can download the notebook here: https://gitlab.com/snippets/1695752 I hope my approach & evaluation is not to naive. Please let me know whether you think this approach would be a viable alternative to use in find_peaks. In that case I'd start a new PR to add this functionality before merging the one we are currently discussing. Best regards, Lars

On Sun, Jan 28, 2018 at 6:05 AM, Lars G. <lagru@mailbox.org> wrote:
Dear SciPy devs,
- argrelmax doesn't catch peaks that are wider than one sample. Decide how to deal with this. I have implemented an alternative here which may even be faster. But I plan to make a performance comparison to confirm this. I have a feeling this should be addressed (if wanted) in a new pull request.
On 26.01.2018 20:59, Eric Larson wrote:
Another PR is fine, but we should merge that one before this find_peaks PR to avoid creating backward compatibility problems.
I have prepared a performance evaluation for you to review here: http://nbviewer.jupyter.org/urls/gitlab.com/snippets/1695752/raw
Thanks, such easy to read comparisons are very helpful to evaluate proposed new functionality. Regarding the flat peaks issue, I'd have a preference for only a single index being returned. Rationale: - whether a peak is considered flat or not can be determined by small amount of noise (either numerical or real) - behavior of the function for different inputs should be consistent, changing the number of returned indices will be annoying for users Regarding the execution time: that should be only a secondary concern here. Ralf
You can download the notebook here: https://gitlab.com/snippets/1695752
I hope my approach & evaluation is not to naive. Please let me know
whether you think this approach would be a viable alternative to use in
find_peaks. In that case I'd start a new PR to add this functionality before merging the one we are currently discussing.
Best regards, Lars _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

On 28.01.2018 23:24, Ralf Gommers wrote:
On Sun, Jan 28, 2018 at 6:05 AM, Lars G. <lagru@mailbox.org> wrote:
I have prepared a performance evaluation for you to review here: http://nbviewer.jupyter.org/urls/gitlab.com/snippets/1695752/raw
Thanks, such easy to read comparisons are very helpful to evaluate proposed new functionality.
Regarding the flat peaks issue, I'd have a preference for only a single index being returned. Rationale: - whether a peak is considered flat or not can be determined by small amount of noise (either numerical or real) - behavior of the function for different inputs should be consistent, changing the number of returned indices will be annoying for users Regarding the execution time: that should be only a secondary concern here.
Okay, it seems we are in agreement here. :) I also played around with Cython and managed to come up with a solution that is both - faster than SciPy's argrelextrema and my old pure Python suggestion - and can detect flat peaks by default. I have updated the linked notebook with the new Cython function for those who are interested. I'll use this as the basis of a new PR when I find the time. Best regards, Lars

On 26.01.2018 20:59, Eric Larson wrote:
- argrelmax doesn't catch peaks that are wider than one sample. Decide how to deal with this. I have implemented an alternative here which may even be faster. But I plan to make a performance comparison to confirm this. I have a feeling this should be addressed (if wanted) in a new pull request.
Another PR is fine, but we should merge that one before this find_peaks PR to avoid creating backward compatibility problems.
Actually I have changed my mind about submitting the alternative in another PR. My current solution http://nbviewer.jupyter.org/urls/gitlab.com/snippets/1695752/raw#Proposed-so... compares to `argrelmax` as follows: - It is measurably faster than `argrelmax` for all supported cases I could think of. - It can find flat maxima / peaks which is my main gripe with `argrelmax`. - It uses Cython with all its implications. I have a feeling that Python is preferred over creating another file to compile but I think usage of Cython is well-founded here. Dealing with flat peaks was really costly in my pure-Python experiments. - It only supports 1D arrays (which honestly is the main use case for the signal module). Because this function will be wrapped by find_peaks (or however choose to name it) it doesn't really need to be exposed to the user `find_peaks(x)` would be equivalent to `_maxima1D(x)`. If nobody has any objections I'll include this as part of my original PR. Regards, Lars

Actually I have changed my mind about submitting the alternative in another PR.
Separate PRs can reduce review burden. In this case, though, it seems like the code is simple enough that it could be included. We just need to make sure we sufficiently thoroughly test the compiled function for possible failure modes. - It is ...
All of this sounds fine to me. The compiled code is worth it for speed gains, assuming its complexity (and thus maintenance burden) is not too high. Looking at it, it looks justified. Assuming nobody objects to this path, feel free to implement and ping me (larsoner) on GitHub for a review when it's ready. Eric

On 26.01.2018 20:49, Lars G. wrote:
Dear SciPy devs,
I'd like to highlight my current pull request which extends SciPy's peak finding capabilities in the signal module. https://github.com/scipy/scipy/pull/8264 I've tried to address all feedback that was given (until now) in said PR. [...]
I am quite confident with the current state and would encourage a last review. So feel free to chime in if you have some tweaks or refactoring to suggest. The only outstanding question is about the name of `find_peaks`. As the discussion on this hasn't really yielded an agreed upon solution I'm going to leave the name as it is for the time being. Going forward I'd like to build on this with new PRs: - Some parts (especially inside the big loops in `peak_prominences` and `peak_widths`) would profit from one or two private functions written in Cython. I get noticeable speed ups if I just replace a few steps of the algorithms with Cython. `find_peaks` would profit from this as well. I think this is worthwhile because prominence is a very useful parameter and it shouldn't be to much work (unit tests and PYX file already exist). - A tutorial that shows some more complicated use cases for the new functions and peak finding in general. If I feel confident, I may even give some examples and guidelines for `find_peaks_cwt` whose documentation was described as confusing in the past. Best regards, Lars

Thanks for working on this. That sounds like a good plan to me. I should be able to review this week. Eric On Feb 14, 2018 7:11 AM, "Lars G." <lagru@mailbox.org> wrote:
On 26.01.2018 20:49, Lars G. wrote:
Dear SciPy devs,
I'd like to highlight my current pull request which extends SciPy's peak finding capabilities in the signal module. https://github.com/scipy/scipy/pull/8264 I've tried to address all feedback that was given (until now) in said PR. [...]
I am quite confident with the current state and would encourage a last review. So feel free to chime in if you have some tweaks or refactoring to suggest.
The only outstanding question is about the name of `find_peaks`. As the discussion on this hasn't really yielded an agreed upon solution I'm going to leave the name as it is for the time being.
Going forward I'd like to build on this with new PRs:
- Some parts (especially inside the big loops in `peak_prominences` and `peak_widths`) would profit from one or two private functions written in Cython. I get noticeable speed ups if I just replace a few steps of the algorithms with Cython. `find_peaks` would profit from this as well. I think this is worthwhile because prominence is a very useful parameter and it shouldn't be to much work (unit tests and PYX file already exist).
- A tutorial that shows some more complicated use cases for the new functions and peak finding in general. If I feel confident, I may even give some examples and guidelines for `find_peaks_cwt` whose documentation was described as confusing in the past.
Best regards, Lars _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
participants (4)
-
Eric Larson
-
Lars G.
-
Ralf Gommers
-
Stefan van der Walt