[SciPy-Dev] Adding non-parametric methods to scipy.stats
Romain Jacob
jacobr at ethz.ch
Mon Aug 17 04:38:28 EDT 2020
Hello everyone,
I've submitted the PR adding support for non-parametric confidence
intervals for quantiles (https://github.com/scipy/scipy/pull/12680).
There has been quite some comments made already, which I fixed
appropriately I believe
Will be happy to get some more feedback or see the PR merged :-)
Note: the last commit has a CI failing apparently due to a file change
in `scipy/sparse/linalg/` which is completely unrelated. I'm not sure
how to go about this... ?
Cheers,
--
Romain
On 15/06/2020 08:27, Romain Jacob wrote:
> On 13/06/2020 20:54, josef.pktd at gmail.com wrote:
>> On Fri, Jun 12, 2020 at 11:29 AM <josef.pktd at gmail.com
>> <mailto:josef.pktd at gmail.com>> wrote:
>>
>> On Fri, Jun 12, 2020 at 1:58 AM Romain Jacob <jacobr at ethz.ch
>> <mailto:jacobr at ethz.ch>> wrote:
>>
>> On 11/06/2020 20:54, Warren Weckesser wrote:
>>> On 6/11/20,josef.pktd at gmail.com <mailto:josef.pktd at gmail.com> <josef.pktd at gmail.com> <mailto:josef.pktd at gmail.com> wrote:
>>>> I think it would make a good and useful addition and fit into scipy.stats.
>>>> There are no pure confint functions yet, AFAIR.
>>> I agree with Josef and Matt, this looks like it would be a nice
>>> addition to SciPy. At the moment, I'm not sure what the API should
>>> look like. Romain, is the work that you've already done available
>>> online somewhere?
>>>
>>> Warren
>>
>> Yes, I have some functional implementation available here:
>> https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L397
>>
>>
>> An implementation detail:
>> binom has cdf and ppf functions
>> My guess, not verified, is that we can just use binom.interval
>>
>> (at least I used those for similar cases)
>>
>>
>> I found my version again
>> https://github.com/statsmodels/statsmodels/issues/6562#issuecomment-592769480
>>
>>
>> I guess that's the same for two sided confint as the references.
>> It doesn't have interpolation if that could be applied in this case.
>>
> I don't entirely follow what you mean here: that the building of the
> probabilities in these two lines(
> https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L438
> and L439) can be built directly form binom without np.cumsum? That
> definitely correct (I actually have code also doing that somewhere).
>
> I did not know about the `interval` method. That's sound interesting
> indeed, but it's not 100% clear to me how the uniqueness problem is
> handled. I looked for the implementation of the method but couldn't
> find it in `binom`... I'm looking in the wrong place?
>
> Cheers,
> --
> Romain
>
>
>> This will eventually end up in statsmodels, but I don't know yet
>> where. That's not a reason not to add it to scipy.stats.
>>
>> Josef
>>
>>
>> Josef
>>
>> There is quite some work to be done on formatting and
>> documentation to comply with the SciPy standards, but
>> functionally it's already there (and as you'll see, the
>> method is quite simple).
>>
>> Cheers,
>> --
>> Romain
>>
>>>> I recently wrote a function for the confidence interval for the median,
>>>> mainly because I ran into the formulas that were easy to code.
>>>> related open issue: how do we get confidence intervals for QQ-plot.
>>>>
>>>> aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion
>>>> a while ago in numpy.
>>>>
>>>> Josef
>>>>
>>>>
>>>> On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland<mhaberla at calpoly.edu> <mailto:mhaberla at calpoly.edu>
>>>> wrote:
>>>>
>>>>> OK, we should let our statistics experts weigh in on this. (I'm not
>>>>> actually one of them.)
>>>>>
>>>>> On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob<jacobr at ethz.ch> <mailto:jacobr at ethz.ch> wrote:
>>>>>
>>>>>> I think a dedicated function makes more sense. This function takes as
>>>>>> input an array, a percentile and a confidence level, and returns the
>>>>>> corresponding one-sided confidence intervals.
>>>>>>
>>>>>> I quickly looked at the list of existing functions in scipy.stats but
>>>>>> did
>>>>>> not see any function in "summary statistics" that does similar things. So
>>>>>> I
>>>>>> would go for a new function.
>>>>>> On 10/06/2020 20:38, Matt Haberland wrote:
>>>>>>
>>>>>> Where do you envision this living in SciPy? In its own function, or
>>>>>> added
>>>>>> functionality to other functions e.g. scipy.stats.percentileofscore
>>>>>> <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileofscore.html#scipy.stats.percentileofscore> <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileofscore.html#scipy.stats.percentileofscore>
>>>>>> ?
>>>>>>
>>>>>> On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob<jacobr at ethz.ch> <mailto:jacobr at ethz.ch> wrote:
>>>>>>
>>>>>>> On 09/06/2020 20:18, Matt Haberland wrote:
>>>>>>>
>>>>>>> Yes, I think we would be interested in confidence intervals, but I
>>>>>>> think
>>>>>>> the algorithm should be very well standard/cited, even if it's not the
>>>>>>> best/most modern.
>>>>>>>
>>>>>>> Yes definitely! We did not invented the method I am referring to, it a
>>>>>>> long-known approach (first proposed by Thompson in 1936 [1], extended
>>>>>>> later
>>>>>>> and commonly found in textbooks, eg [2,3]). This method is very simple,
>>>>>>> quite powerful, yet it has been largely overlooked in many scientific
>>>>>>> fields. I found no available implementation to facilitate its use (at
>>>>>>> least
>>>>>>> not in Python, there may be something in R, I have not looked).
>>>>>>>
>>>>>>> [1]https://www.jstor.org/stable/2957563
>>>>>>> [2]doi.org/10.1002/0471722162.ch7 <http://doi.org/10.1002/0471722162.ch7>
>>>>>>> [3]https://perfeval.epfl.ch/
>>>>>>>
>>>>>>> @WarrenWeckesser and I had planned to work on confidence intervals for
>>>>>>> the test statistics returned by our statistical tests
>>>>>>> <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests> <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
>>>>>>>
>>>>>>>
>>>>>>> That is also definitely interesting, although I am not myself an expert
>>>>>>> in that area. I am glad to see that the complete list contains some
>>>>>>> non-parametric tests :-)
>>>>>>>
>>>>>>> Cheers,
>>>>>>> --
>>>>>>> Romain
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob<jacobr at ethz.ch> <mailto:jacobr at ethz.ch> wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> I have been working for some time on the implementation of
>>>>>>>> non-parametric methods to compute confidence intervals for
>>>>>>>> percentiles.
>>>>>>>> There are some very interesting results in the literature (see e.g. a
>>>>>>>> nice
>>>>>>>> pitch in [1]) which I think it would be great to add to SciPy to make
>>>>>>>> them
>>>>>>>> more readily available. It also seems to be rather in line with
>>>>>>>> "recent"
>>>>>>>> discussions of the roadmap for scipy.stats [2].
>>>>>>>>
>>>>>>>> I would be interested in contributing this. What do you think?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> --
>>>>>>>> Romain
>>>>>>>>
>>>>>>>> [1]https://ieeexplore.ieee.org/document/6841797
>>>>>>>> [2]https://github.com/scipy/scipy/issues/10577
>>>>>>>> --
>>>>>>>> Romain Jacob
>>>>>>>> Postdoctoral Researcher
>>>>>>>> ETH Zurich - Computer Engineering and Networks Laboratory
>>>>>>>> www.romainjacob.net <http://www.romainjacob.net>
>>>>>>>> @RJacobPartner<https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner>
>>>>>>>> Gloriastrasse 35, ETZ G75
>>>>>>>> 8092 Zurich
>>>>>>>> +41 7 68 16 88 22
>>>>>>>> _______________________________________________
>>>>>>>> SciPy-Dev mailing list
>>>>>>>> SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
>>>>>>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>>>>>>
>>>>>>> --
>>>>>>> Matt Haberland
>>>>>>> Assistant Professor
>>>>>>> BioResource and Agricultural Engineering
>>>>>>> 08A-3K, Cal Poly
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> SciPy-Dev mailing
>>>>>>> listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev <mailto:listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev>
>>>>>>>
>>>>>>> --
>>>>>>> Romain Jacob
>>>>>>> Postdoctoral Researcher
>>>>>>> ETH Zurich - Computer Engineering and Networks Laboratory
>>>>>>> www.romainjacob.net <http://www.romainjacob.net>
>>>>>>> @RJacobPartner<https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner>
>>>>>>> Gloriastrasse 35, ETZ G75
>>>>>>> 8092 Zurich
>>>>>>> +41 7 68 16 88 22
>>>>>>> _______________________________________________
>>>>>>> SciPy-Dev mailing list
>>>>>>> SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
>>>>>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>>>>>
>>>>>> --
>>>>>> Matt Haberland
>>>>>> Assistant Professor
>>>>>> BioResource and Agricultural Engineering
>>>>>> 08A-3K, Cal Poly
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-Dev mailing
>>>>>> listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev <mailto:listSciPy-Dev at python.orghttps://mail.python.org/mailman/listinfo/scipy-dev>
>>>>>>
>>>>>> _______________________________________________
>>>>>> SciPy-Dev mailing list
>>>>>> SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
>>>>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>>>>
>>>>> --
>>>>> Matt Haberland
>>>>> Assistant Professor
>>>>> BioResource and Agricultural Engineering
>>>>> 08A-3K, Cal Poly
>>>>> _______________________________________________
>>>>> SciPy-Dev mailing list
>>>>> SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
>>>>> https://mail.python.org/mailman/listinfo/scipy-dev
>>>>>
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
>>> https://mail.python.org/mailman/listinfo/scipy-dev
>> --
>> Romain Jacob
>> Postdoctoral Researcher
>> ETH Zurich - Computer Engineering and Networks Laboratory
>> www.romainjacob.net <https://www.romainjacob.net/>
>> @RJacobPartner <https://twitter.com/RJacobPartner>
>> Gloriastrasse 35, ETZ G75
>> 8092 Zurich
>> +41 7 68 16 88 22
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org <mailto:SciPy-Dev at python.org>
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
> --
> Romain Jacob
> Postdoctoral Researcher
> ETH Zurich - Computer Engineering and Networks Laboratory
> www.romainjacob.net <https://www.romainjacob.net/>
> @RJacobPartner <https://twitter.com/RJacobPartner>
> Gloriastrasse 35, ETZ G75
> 8092 Zurich
> +41 7 68 16 88 22
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
--
Romain Jacob
Postdoctoral Researcher
ETH Zurich - Computer Engineering and Networks Laboratory
www.romainjacob.net <https://www.romainjacob.net/>
@RJacobPartner <https://twitter.com/RJacobPartner>
Gloriastrasse 35, ETZ G75
8092 Zurich
+41 7 68 16 88 22
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20200817/98be1711/attachment-0001.html>
More information about the SciPy-Dev
mailing list