[SciPy-Dev] Limit number of iterations in sigma clipping

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Oct 18 11:42:59 EDT 2015


On Sun, Oct 18, 2015 at 11:23 AM, Ralf Gommers <ralf.gommers at gmail.com>
wrote:

>
>
> On Sun, Oct 18, 2015 at 1:18 PM, Sri Krishna <kitchi.srikrishna at gmail.com>
> wrote:
>
>> On 18 October 2015 at 15:05, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>>
>>>
>>>
>>>
>>> On Fri, Oct 16, 2015 at 6:21 AM, Sri Krishna <
>>> kitchi.srikrishna at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> The current sigma clipping function has no iters keyword.
>>>>
>>>
>>> For clarity: this is stats.sigmaclip
>>>
>>>
>>>> I feel it would be useful to include it in the function, with a default
>>>> like `iters=None` so that if people do want to use the keyword they will be
>>>> able to do so.
>>>>
>>>> With the defaults of `iters = None` it shouldn't break anyone's
>>>> workflow and will correspond to the current default behaviour.
>>>>
>>>> If there is any interest in this, I can file a pull request soon.
>>>>
>>>
>>> Makes sense to me to add this.
>>>
>>> I just saw that AstroPy has a much more extensive sigma clipping
>>> function. Replacing stats.sigmaclip with
>>> http://docs.astropy.org/en/latest/api/astropy.stats.sigma_clip.html
>>> would be the way to go I think. In a backwards-compatible fashion of
>>> course, so the `sigma` keyword from that function should be left out.
>>>
>>
>> Well, I am working on a pull request on the astropy sigma clipping
>> routine to incorporate the scipy function - The scipy function can be upto
>> 100x faster than the astropy function in benchmarks that I've run, and I'm
>> not convinced that it is worthwhile loosing the speed for features. So once
>> the PR is merged, the astropy function will have the option to use the
>> scipy function as well as the existing astropy function,
>>
>
> Ouch, 100x is a lot.
>
>
>> As far as I can tell, the speed improvements in the scipy.stats.sigmaclip
>> function come because we don't use masked arrays, and the sigma clipped
>> data is discarded every iteration. We could potentially use another
>> function instead of the mean (i.e., median or a user supplied function
>> analogous to astropy) without too much of a speed hit.
>>
>> The only three features we need to port from the astropy function are the
>> `cenfunc` and `varfunc` keywords to define a custom function to calculate
>> mean/standard deviation and the `axis` keyword. I'm not sure we need to use
>> masked arrays because of the previously mentioned performance implications.
>>
>> So should I work on a pull request to adapt the astropy function into
>> scipy in a backwards compatible manner?
>>
>
> All the keyword you mentioned make sense to add I think, so yes please.
>
> And a bit of bikeshedding: the names used by AstroPy aren't optimal imho.
> Maybe:
> - iters --> numiter
>

statsmodels uses maxiter in cases like this because it can stop early


> - cenfunc --> centerfunc
>

I think this should correspond to the robust ANOVA function, IIRC


names have power and are not bikesheds


(default color choices ain't no bikesheds either)


Josef



>
> Ralf
>
>
>> Thanks,
>> Krishna
>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> https://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20151018/b30fcb37/attachment.html>


More information about the SciPy-Dev mailing list