Limit number of iterations in sigma clipping
![](https://secure.gravatar.com/avatar/1900a7ce7fa8c035ec3d60e3b9f90e50.jpg?s=120&d=mm&r=g)
Hi, The current sigma clipping function has no iters keyword. I feel it would be useful to include it in the function, with a default like `iters=None` so that if people do want to use the keyword they will be able to do so. With the defaults of `iters = None` it shouldn't break anyone's workflow and will correspond to the current default behaviour. If there is any interest in this, I can file a pull request soon. Thanks, Srikrishna Sekhar
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Fri, Oct 16, 2015 at 6:21 AM, Sri Krishna <kitchi.srikrishna@gmail.com> wrote:
Hi,
The current sigma clipping function has no iters keyword.
For clarity: this is stats.sigmaclip
I feel it would be useful to include it in the function, with a default like `iters=None` so that if people do want to use the keyword they will be able to do so.
With the defaults of `iters = None` it shouldn't break anyone's workflow and will correspond to the current default behaviour.
If there is any interest in this, I can file a pull request soon.
Makes sense to me to add this. I just saw that AstroPy has a much more extensive sigma clipping function. Replacing stats.sigmaclip with http://docs.astropy.org/en/latest/api/astropy.stats.sigma_clip.html would be the way to go I think. In a backwards-compatible fashion of course, so the `sigma` keyword from that function should be left out. Ralf
![](https://secure.gravatar.com/avatar/1900a7ce7fa8c035ec3d60e3b9f90e50.jpg?s=120&d=mm&r=g)
On 18 October 2015 at 15:05, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Oct 16, 2015 at 6:21 AM, Sri Krishna <kitchi.srikrishna@gmail.com> wrote:
Hi,
The current sigma clipping function has no iters keyword.
For clarity: this is stats.sigmaclip
I feel it would be useful to include it in the function, with a default like `iters=None` so that if people do want to use the keyword they will be able to do so.
With the defaults of `iters = None` it shouldn't break anyone's workflow and will correspond to the current default behaviour.
If there is any interest in this, I can file a pull request soon.
Makes sense to me to add this.
I just saw that AstroPy has a much more extensive sigma clipping function. Replacing stats.sigmaclip with http://docs.astropy.org/en/latest/api/astropy.stats.sigma_clip.html would be the way to go I think. In a backwards-compatible fashion of course, so the `sigma` keyword from that function should be left out.
Well, I am working on a pull request on the astropy sigma clipping routine to incorporate the scipy function - The scipy function can be upto 100x faster than the astropy function in benchmarks that I've run, and I'm not convinced that it is worthwhile loosing the speed for features. So once the PR is merged, the astropy function will have the option to use the scipy function as well as the existing astropy function, As far as I can tell, the speed improvements in the scipy.stats.sigmaclip function come because we don't use masked arrays, and the sigma clipped data is discarded every iteration. We could potentially use another function instead of the mean (i.e., median or a user supplied function analogous to astropy) without too much of a speed hit. The only three features we need to port from the astropy function are the `cenfunc` and `varfunc` keywords to define a custom function to calculate mean/standard deviation and the `axis` keyword. I'm not sure we need to use masked arrays because of the previously mentioned performance implications. So should I work on a pull request to adapt the astropy function into scipy in a backwards compatible manner? Thanks, Krishna
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sun, Oct 18, 2015 at 1:18 PM, Sri Krishna <kitchi.srikrishna@gmail.com> wrote:
On 18 October 2015 at 15:05, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Oct 16, 2015 at 6:21 AM, Sri Krishna <kitchi.srikrishna@gmail.com
wrote:
Hi,
The current sigma clipping function has no iters keyword.
For clarity: this is stats.sigmaclip
I feel it would be useful to include it in the function, with a default like `iters=None` so that if people do want to use the keyword they will be able to do so.
With the defaults of `iters = None` it shouldn't break anyone's workflow and will correspond to the current default behaviour.
If there is any interest in this, I can file a pull request soon.
Makes sense to me to add this.
I just saw that AstroPy has a much more extensive sigma clipping function. Replacing stats.sigmaclip with http://docs.astropy.org/en/latest/api/astropy.stats.sigma_clip.html would be the way to go I think. In a backwards-compatible fashion of course, so the `sigma` keyword from that function should be left out.
Well, I am working on a pull request on the astropy sigma clipping routine to incorporate the scipy function - The scipy function can be upto 100x faster than the astropy function in benchmarks that I've run, and I'm not convinced that it is worthwhile loosing the speed for features. So once the PR is merged, the astropy function will have the option to use the scipy function as well as the existing astropy function,
Ouch, 100x is a lot.
As far as I can tell, the speed improvements in the scipy.stats.sigmaclip function come because we don't use masked arrays, and the sigma clipped data is discarded every iteration. We could potentially use another function instead of the mean (i.e., median or a user supplied function analogous to astropy) without too much of a speed hit.
The only three features we need to port from the astropy function are the `cenfunc` and `varfunc` keywords to define a custom function to calculate mean/standard deviation and the `axis` keyword. I'm not sure we need to use masked arrays because of the previously mentioned performance implications.
So should I work on a pull request to adapt the astropy function into scipy in a backwards compatible manner?
All the keyword you mentioned make sense to add I think, so yes please. And a bit of bikeshedding: the names used by AstroPy aren't optimal imho. Maybe: - iters --> numiter - cenfunc --> centerfunc Ralf
Thanks, Krishna
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
![](https://secure.gravatar.com/avatar/ad13088a623822caf74e635a68a55eae.jpg?s=120&d=mm&r=g)
On Sun, Oct 18, 2015 at 11:23 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sun, Oct 18, 2015 at 1:18 PM, Sri Krishna <kitchi.srikrishna@gmail.com> wrote:
On 18 October 2015 at 15:05, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Oct 16, 2015 at 6:21 AM, Sri Krishna < kitchi.srikrishna@gmail.com> wrote:
Hi,
The current sigma clipping function has no iters keyword.
For clarity: this is stats.sigmaclip
I feel it would be useful to include it in the function, with a default like `iters=None` so that if people do want to use the keyword they will be able to do so.
With the defaults of `iters = None` it shouldn't break anyone's workflow and will correspond to the current default behaviour.
If there is any interest in this, I can file a pull request soon.
Makes sense to me to add this.
I just saw that AstroPy has a much more extensive sigma clipping function. Replacing stats.sigmaclip with http://docs.astropy.org/en/latest/api/astropy.stats.sigma_clip.html would be the way to go I think. In a backwards-compatible fashion of course, so the `sigma` keyword from that function should be left out.
Well, I am working on a pull request on the astropy sigma clipping routine to incorporate the scipy function - The scipy function can be upto 100x faster than the astropy function in benchmarks that I've run, and I'm not convinced that it is worthwhile loosing the speed for features. So once the PR is merged, the astropy function will have the option to use the scipy function as well as the existing astropy function,
Ouch, 100x is a lot.
As far as I can tell, the speed improvements in the scipy.stats.sigmaclip function come because we don't use masked arrays, and the sigma clipped data is discarded every iteration. We could potentially use another function instead of the mean (i.e., median or a user supplied function analogous to astropy) without too much of a speed hit.
The only three features we need to port from the astropy function are the `cenfunc` and `varfunc` keywords to define a custom function to calculate mean/standard deviation and the `axis` keyword. I'm not sure we need to use masked arrays because of the previously mentioned performance implications.
So should I work on a pull request to adapt the astropy function into scipy in a backwards compatible manner?
All the keyword you mentioned make sense to add I think, so yes please.
And a bit of bikeshedding: the names used by AstroPy aren't optimal imho. Maybe: - iters --> numiter
statsmodels uses maxiter in cases like this because it can stop early
- cenfunc --> centerfunc
I think this should correspond to the robust ANOVA function, IIRC names have power and are not bikesheds (default color choices ain't no bikesheds either) Josef
Ralf
Thanks, Krishna
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
![](https://secure.gravatar.com/avatar/1900a7ce7fa8c035ec3d60e3b9f90e50.jpg?s=120&d=mm&r=g)
On 18 October 2015 at 21:12, <josef.pktd@gmail.com> wrote:
On Sun, Oct 18, 2015 at 11:23 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sun, Oct 18, 2015 at 1:18 PM, Sri Krishna <kitchi.srikrishna@gmail.com
wrote:
On 18 October 2015 at 15:05, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Oct 16, 2015 at 6:21 AM, Sri Krishna < kitchi.srikrishna@gmail.com> wrote:
Hi,
The current sigma clipping function has no iters keyword.
For clarity: this is stats.sigmaclip
I feel it would be useful to include it in the function, with a default like `iters=None` so that if people do want to use the keyword they will be able to do so.
With the defaults of `iters = None` it shouldn't break anyone's workflow and will correspond to the current default behaviour.
If there is any interest in this, I can file a pull request soon.
Makes sense to me to add this.
I just saw that AstroPy has a much more extensive sigma clipping function. Replacing stats.sigmaclip with http://docs.astropy.org/en/latest/api/astropy.stats.sigma_clip.html would be the way to go I think. In a backwards-compatible fashion of course, so the `sigma` keyword from that function should be left out.
Well, I am working on a pull request on the astropy sigma clipping routine to incorporate the scipy function - The scipy function can be upto 100x faster than the astropy function in benchmarks that I've run, and I'm not convinced that it is worthwhile loosing the speed for features. So once the PR is merged, the astropy function will have the option to use the scipy function as well as the existing astropy function,
Ouch, 100x is a lot.
As far as I can tell, the speed improvements in the scipy.stats.sigmaclip function come because we don't use masked arrays, and the sigma clipped data is discarded every iteration. We could potentially use another function instead of the mean (i.e., median or a user supplied function analogous to astropy) without too much of a speed hit.
The only three features we need to port from the astropy function are the `cenfunc` and `varfunc` keywords to define a custom function to calculate mean/standard deviation and the `axis` keyword. I'm not sure we need to use masked arrays because of the previously mentioned performance implications.
So should I work on a pull request to adapt the astropy function into scipy in a backwards compatible manner?
All the keyword you mentioned make sense to add I think, so yes please.
And a bit of bikeshedding: the names used by AstroPy aren't optimal imho. Maybe: - iters --> numiter
statsmodels uses maxiter in cases like this because it can stop early
- cenfunc --> centerfunc
I think this should correspond to the robust ANOVA function, IIRC
names have power and are not bikesheds
(default color choices ain't no bikesheds either)
I've taken my time about it, but finally got around to finishing the PR here <https://github.com/scipy/scipy/pull/5559>. Do let me know if I need to add/remove/throw away/fix anything! I hope I've done everything correctly - This is the first time I'm submitting a PR to scipy. Thanks, Krishna
participants (3)
-
josef.pktd@gmail.com
-
Ralf Gommers
-
Sri Krishna