Adding non-parametric methods to scipy.stats
Hello everyone, I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2]. I would be interested in contributing this. What do you think? Cheers, -- Romain [1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577 -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern. @WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>. On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577 -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked). [1] https://www.jstor.org/stable/2957563 [2] doi.org/10.1002/0471722162.ch7 [3] https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-) Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch <mailto:jacobr@ethz.ch>> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> ? On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob <jacobr@ethz.ch> wrote:
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked).
[1] https://www.jstor.org/stable/2957563 [2] doi.org/10.1002/0471722162.ch7 [3] https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-)
Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577 -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals. I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function. On 10/06/2020 20:38, Matt Haberland wrote:
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileofscore.html#scipy.stats.percentileofscore>?
On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob <jacobr@ethz.ch <mailto:jacobr@ethz.ch>> wrote:
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked).
[1] https://www.jstor.org/stable/2957563 [2] doi.org/10.1002/0471722162.ch7 <http://doi.org/10.1002/0471722162.ch7> [3] https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-)
Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch <mailto:jacobr@ethz.ch>> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
OK, we should let our statistics experts weigh in on this. (I'm not actually one of them.) On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob <jacobr@ethz.ch> wrote:
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals.
I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function. On 10/06/2020 20:38, Matt Haberland wrote:
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> ?
On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob <jacobr@ethz.ch> wrote:
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked).
[1] https://www.jstor.org/stable/2957563 [2] doi.org/10.1002/0471722162.ch7 [3] https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-)
Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577 -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
I think it would make a good and useful addition and fit into scipy.stats. There are no pure confint functions yet, AFAIR. I recently wrote a function for the confidence interval for the median, mainly because I ran into the formulas that were easy to code. related open issue: how do we get confidence intervals for QQ-plot. aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion a while ago in numpy. Josef On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland <mhaberla@calpoly.edu> wrote:
OK, we should let our statistics experts weigh in on this. (I'm not actually one of them.)
On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob <jacobr@ethz.ch> wrote:
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals.
I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function. On 10/06/2020 20:38, Matt Haberland wrote:
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> ?
On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob <jacobr@ethz.ch> wrote:
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked).
[1] https://www.jstor.org/stable/2957563 [2] doi.org/10.1002/0471722162.ch7 [3] https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-)
Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577 -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On 6/11/20, josef.pktd@gmail.com <josef.pktd@gmail.com> wrote:
I think it would make a good and useful addition and fit into scipy.stats. There are no pure confint functions yet, AFAIR.
I agree with Josef and Matt, this looks like it would be a nice addition to SciPy. At the moment, I'm not sure what the API should look like. Romain, is the work that you've already done available online somewhere? Warren
I recently wrote a function for the confidence interval for the median, mainly because I ran into the formulas that were easy to code. related open issue: how do we get confidence intervals for QQ-plot.
aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion a while ago in numpy.
Josef
On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland <mhaberla@calpoly.edu> wrote:
OK, we should let our statistics experts weigh in on this. (I'm not actually one of them.)
On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob <jacobr@ethz.ch> wrote:
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals.
I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function. On 10/06/2020 20:38, Matt Haberland wrote:
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> ?
On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob <jacobr@ethz.ch> wrote:
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked).
[1] https://www.jstor.org/stable/2957563 [2] doi.org/10.1002/0471722162.ch7 [3] https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-)
Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577 -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On 11/06/2020 20:54, Warren Weckesser wrote:
On 6/11/20, josef.pktd@gmail.com <josef.pktd@gmail.com> wrote:
I think it would make a good and useful addition and fit into scipy.stats. There are no pure confint functions yet, AFAIR. I agree with Josef and Matt, this looks like it would be a nice addition to SciPy. At the moment, I'm not sure what the API should look like. Romain, is the work that you've already done available online somewhere?
Warren
Yes, I have some functional implementation available here: https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L397 There is quite some work to be done on formatting and documentation to comply with the SciPy standards, but functionally it's already there (and as you'll see, the method is quite simple). Cheers, -- Romain
I recently wrote a function for the confidence interval for the median, mainly because I ran into the formulas that were easy to code. related open issue: how do we get confidence intervals for QQ-plot.
aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion a while ago in numpy.
Josef
On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland <mhaberla@calpoly.edu> wrote:
OK, we should let our statistics experts weigh in on this. (I'm not actually one of them.)
On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob <jacobr@ethz.ch> wrote:
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals.
I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function. On 10/06/2020 20:38, Matt Haberland wrote:
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> ?
On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob <jacobr@ethz.ch> wrote:
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked).
[1] https://www.jstor.org/stable/2957563 [2] doi.org/10.1002/0471722162.ch7 [3] https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-)
Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577 -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22
On Fri, Jun 12, 2020 at 1:58 AM Romain Jacob <jacobr@ethz.ch> wrote:
On 11/06/2020 20:54, Warren Weckesser wrote:
On 6/11/20, josef.pktd@gmail.com <josef.pktd@gmail.com> <josef.pktd@gmail.com> wrote:
I think it would make a good and useful addition and fit into scipy.stats. There are no pure confint functions yet, AFAIR.
I agree with Josef and Matt, this looks like it would be a nice addition to SciPy. At the moment, I'm not sure what the API should look like. Romain, is the work that you've already done available online somewhere?
Warren
Yes, I have some functional implementation available here: https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L397
An implementation detail: binom has cdf and ppf functions My guess, not verified, is that we can just use binom.interval (at least I used those for similar cases) Josef
There is quite some work to be done on formatting and documentation to comply with the SciPy standards, but functionally it's already there (and as you'll see, the method is quite simple).
Cheers, -- Romain
I recently wrote a function for the confidence interval for the median, mainly because I ran into the formulas that were easy to code. related open issue: how do we get confidence intervals for QQ-plot.
aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion a while ago in numpy.
Josef
On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland <mhaberla@calpoly.edu> <mhaberla@calpoly.edu> wrote:
OK, we should let our statistics experts weigh in on this. (I'm not actually one of them.)
On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob <jacobr@ethz.ch> <jacobr@ethz.ch> wrote:
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals.
I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function. On 10/06/2020 20:38, Matt Haberland wrote:
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore<https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> ?
On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob <jacobr@ethz.ch> <jacobr@ethz.ch> wrote:
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked).
[1] https://www.jstor.org/stable/2957563 [2] doi.org/10.1002/0471722162.ch7 [3] https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests<https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests> <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-)
Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch> <jacobr@ethz.ch> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577 -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratorywww.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailinglistSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratorywww.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailinglistSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly _______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On Fri, Jun 12, 2020 at 11:29 AM <josef.pktd@gmail.com> wrote:
On Fri, Jun 12, 2020 at 1:58 AM Romain Jacob <jacobr@ethz.ch> wrote:
On 11/06/2020 20:54, Warren Weckesser wrote:
On 6/11/20, josef.pktd@gmail.com <josef.pktd@gmail.com> <josef.pktd@gmail.com> wrote:
I think it would make a good and useful addition and fit into scipy.stats. There are no pure confint functions yet, AFAIR.
I agree with Josef and Matt, this looks like it would be a nice addition to SciPy. At the moment, I'm not sure what the API should look like. Romain, is the work that you've already done available online somewhere?
Warren
Yes, I have some functional implementation available here: https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L397
An implementation detail: binom has cdf and ppf functions My guess, not verified, is that we can just use binom.interval
(at least I used those for similar cases)
I found my version again https://github.com/statsmodels/statsmodels/issues/6562#issuecomment-59276948... I guess that's the same for two sided confint as the references. It doesn't have interpolation if that could be applied in this case. This will eventually end up in statsmodels, but I don't know yet where. That's not a reason not to add it to scipy.stats. Josef
Josef
There is quite some work to be done on formatting and documentation to comply with the SciPy standards, but functionally it's already there (and as you'll see, the method is quite simple).
Cheers, -- Romain
I recently wrote a function for the confidence interval for the median, mainly because I ran into the formulas that were easy to code. related open issue: how do we get confidence intervals for QQ-plot.
aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion a while ago in numpy.
Josef
On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland <mhaberla@calpoly.edu> <mhaberla@calpoly.edu> wrote:
OK, we should let our statistics experts weigh in on this. (I'm not actually one of them.)
On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob <jacobr@ethz.ch> <jacobr@ethz.ch> wrote:
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals.
I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function. On 10/06/2020 20:38, Matt Haberland wrote:
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore<https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> ?
On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob <jacobr@ethz.ch> <jacobr@ethz.ch> wrote:
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked).
[1] https://www.jstor.org/stable/2957563 [2] doi.org/10.1002/0471722162.ch7 [3] https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests<https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests> <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-)
Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch> <jacobr@ethz.ch> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577 -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratorywww.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailinglistSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratorywww.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailinglistSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly _______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On 13/06/2020 20:54, josef.pktd@gmail.com wrote:
On Fri, Jun 12, 2020 at 11:29 AM <josef.pktd@gmail.com <mailto:josef.pktd@gmail.com>> wrote:
On Fri, Jun 12, 2020 at 1:58 AM Romain Jacob <jacobr@ethz.ch <mailto:jacobr@ethz.ch>> wrote:
On 11/06/2020 20:54, Warren Weckesser wrote:
On 6/11/20,josef.pktd@gmail.com <mailto:josef.pktd@gmail.com> <josef.pktd@gmail.com> <mailto:josef.pktd@gmail.com> wrote:
I think it would make a good and useful addition and fit into scipy.stats. There are no pure confint functions yet, AFAIR.
I agree with Josef and Matt, this looks like it would be a nice addition to SciPy. At the moment, I'm not sure what the API should look like. Romain, is the work that you've already done available online somewhere?
Warren
Yes, I have some functional implementation available here: https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L397
An implementation detail: binom has cdf and ppf functions My guess, not verified, is that we can just use binom.interval
(at least I used those for similar cases)
I found my version again https://github.com/statsmodels/statsmodels/issues/6562#issuecomment-59276948...
I guess that's the same for two sided confint as the references. It doesn't have interpolation if that could be applied in this case.
I don't entirely follow what you mean here: that the building of the probabilities in these two lines( https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L438 and L439) can be built directly form binom without np.cumsum? That definitely correct (I actually have code also doing that somewhere). I did not know about the `interval` method. That's sound interesting indeed, but it's not 100% clear to me how the uniqueness problem is handled. I looked for the implementation of the method but couldn't find it in `binom`... I'm looking in the wrong place? Cheers, -- Romain
This will eventually end up in statsmodels, but I don't know yet where. That's not a reason not to add it to scipy.stats.
Josef
Josef
There is quite some work to be done on formatting and documentation to comply with the SciPy standards, but functionally it's already there (and as you'll see, the method is quite simple).
Cheers, -- Romain
I recently wrote a function for the confidence interval for the median, mainly because I ran into the formulas that were easy to code. related open issue: how do we get confidence intervals for QQ-plot.
aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion a while ago in numpy.
Josef
On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland<mhaberla@calpoly.edu> <mailto:mhaberla@calpoly.edu> wrote:
OK, we should let our statistics experts weigh in on this. (I'm not actually one of them.)
On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob<jacobr@ethz.ch> <mailto:jacobr@ethz.ch> wrote:
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals.
I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function. On 10/06/2020 20:38, Matt Haberland wrote:
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> ?
On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob<jacobr@ethz.ch> <mailto:jacobr@ethz.ch> wrote:
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked).
[1]https://www.jstor.org/stable/2957563 [2]doi.org/10.1002/0471722162.ch7 <http://doi.org/10.1002/0471722162.ch7> [3]https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests> <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-)
Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob<jacobr@ethz.ch> <mailto:jacobr@ethz.ch> wrote:
> Hello everyone, > > I have been working for some time on the implementation of > non-parametric methods to compute confidence intervals for > percentiles. > There are some very interesting results in the literature (see e.g. a > nice > pitch in [1]) which I think it would be great to add to SciPy to make > them > more readily available. It also seems to be rather in line with > "recent" > discussions of the roadmap for scipy.stats [2]. > > I would be interested in contributing this. What do you think? > > Cheers, > -- > Romain > > [1]https://ieeexplore.ieee.org/document/6841797 > [2]https://github.com/scipy/scipy/issues/10577 > -- > Romain Jacob > Postdoctoral Researcher > ETH Zurich - Computer Engineering and Networks Laboratory > www.romainjacob.net <http://www.romainjacob.net> > @RJacobPartner<https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner> > Gloriastrasse 35, ETZ G75 > 8092 Zurich > +41 7 68 16 88 22 > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> > https://mail.python.org/mailman/listinfo/scipy-dev > -- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev <mailto:listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev>
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <http://www.romainjacob.net> @RJacobPartner<https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev <mailto:listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev>
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22
Hello everyone, I've submitted the PR adding support for non-parametric confidence intervals for quantiles (https://github.com/scipy/scipy/pull/12680). There has been quite some comments made already, which I fixed appropriately I believe Will be happy to get some more feedback or see the PR merged :-) Note: the last commit has a CI failing apparently due to a file change in `scipy/sparse/linalg/` which is completely unrelated. I'm not sure how to go about this... ? Cheers, -- Romain On 15/06/2020 08:27, Romain Jacob wrote:
On 13/06/2020 20:54, josef.pktd@gmail.com wrote:
On Fri, Jun 12, 2020 at 11:29 AM <josef.pktd@gmail.com <mailto:josef.pktd@gmail.com>> wrote:
On Fri, Jun 12, 2020 at 1:58 AM Romain Jacob <jacobr@ethz.ch <mailto:jacobr@ethz.ch>> wrote:
On 11/06/2020 20:54, Warren Weckesser wrote:
On 6/11/20,josef.pktd@gmail.com <mailto:josef.pktd@gmail.com> <josef.pktd@gmail.com> <mailto:josef.pktd@gmail.com> wrote:
I think it would make a good and useful addition and fit into scipy.stats. There are no pure confint functions yet, AFAIR.
I agree with Josef and Matt, this looks like it would be a nice addition to SciPy. At the moment, I'm not sure what the API should look like. Romain, is the work that you've already done available online somewhere?
Warren
Yes, I have some functional implementation available here: https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L397
An implementation detail: binom has cdf and ppf functions My guess, not verified, is that we can just use binom.interval
(at least I used those for similar cases)
I found my version again https://github.com/statsmodels/statsmodels/issues/6562#issuecomment-59276948...
I guess that's the same for two sided confint as the references. It doesn't have interpolation if that could be applied in this case.
I don't entirely follow what you mean here: that the building of the probabilities in these two lines( https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L438 and L439) can be built directly form binom without np.cumsum? That definitely correct (I actually have code also doing that somewhere).
I did not know about the `interval` method. That's sound interesting indeed, but it's not 100% clear to me how the uniqueness problem is handled. I looked for the implementation of the method but couldn't find it in `binom`... I'm looking in the wrong place?
Cheers, -- Romain
This will eventually end up in statsmodels, but I don't know yet where. That's not a reason not to add it to scipy.stats.
Josef
Josef
There is quite some work to be done on formatting and documentation to comply with the SciPy standards, but functionally it's already there (and as you'll see, the method is quite simple).
Cheers, -- Romain
I recently wrote a function for the confidence interval for the median, mainly because I ran into the formulas that were easy to code. related open issue: how do we get confidence intervals for QQ-plot.
aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion a while ago in numpy.
Josef
On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland<mhaberla@calpoly.edu> <mailto:mhaberla@calpoly.edu> wrote:
OK, we should let our statistics experts weigh in on this. (I'm not actually one of them.)
On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob<jacobr@ethz.ch> <mailto:jacobr@ethz.ch> wrote:
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals.
I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function. On 10/06/2020 20:38, Matt Haberland wrote:
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> ?
On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob<jacobr@ethz.ch> <mailto:jacobr@ethz.ch> wrote:
> On 09/06/2020 20:18, Matt Haberland wrote: > > Yes, I think we would be interested in confidence intervals, but I > think > the algorithm should be very well standard/cited, even if it's not the > best/most modern. > > Yes definitely! We did not invented the method I am referring to, it a > long-known approach (first proposed by Thompson in 1936 [1], extended > later > and commonly found in textbooks, eg [2,3]). This method is very simple, > quite powerful, yet it has been largely overlooked in many scientific > fields. I found no available implementation to facilitate its use (at > least > not in Python, there may be something in R, I have not looked). > > [1]https://www.jstor.org/stable/2957563 > [2]doi.org/10.1002/0471722162.ch7 <http://doi.org/10.1002/0471722162.ch7> > [3]https://perfeval.epfl.ch/ > > @WarrenWeckesser and I had planned to work on confidence intervals for > the test statistics returned by our statistical tests > <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests> <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>. > > > That is also definitely interesting, although I am not myself an expert > in that area. I am glad to see that the complete list contains some > non-parametric tests :-) > > Cheers, > -- > Romain > > > On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob<jacobr@ethz.ch> <mailto:jacobr@ethz.ch> wrote: > >> Hello everyone, >> >> I have been working for some time on the implementation of >> non-parametric methods to compute confidence intervals for >> percentiles. >> There are some very interesting results in the literature (see e.g. a >> nice >> pitch in [1]) which I think it would be great to add to SciPy to make >> them >> more readily available. It also seems to be rather in line with >> "recent" >> discussions of the roadmap for scipy.stats [2]. >> >> I would be interested in contributing this. What do you think? >> >> Cheers, >> -- >> Romain >> >> [1]https://ieeexplore.ieee.org/document/6841797 >> [2]https://github.com/scipy/scipy/issues/10577 >> -- >> Romain Jacob >> Postdoctoral Researcher >> ETH Zurich - Computer Engineering and Networks Laboratory >> www.romainjacob.net <http://www.romainjacob.net> >> @RJacobPartner<https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner> >> Gloriastrasse 35, ETZ G75 >> 8092 Zurich >> +41 7 68 16 88 22 >> _______________________________________________ >> SciPy-Dev mailing list >> SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> >> https://mail.python.org/mailman/listinfo/scipy-dev >> > -- > Matt Haberland > Assistant Professor > BioResource and Agricultural Engineering > 08A-3K, Cal Poly > > _______________________________________________ > SciPy-Dev mailing > listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev <mailto:listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev> > > -- > Romain Jacob > Postdoctoral Researcher > ETH Zurich - Computer Engineering and Networks Laboratory > www.romainjacob.net <http://www.romainjacob.net> > @RJacobPartner<https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner> > Gloriastrasse 35, ETZ G75 > 8092 Zurich > +41 7 68 16 88 22 > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> > https://mail.python.org/mailman/listinfo/scipy-dev > -- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev <mailto:listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev>
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org <mailto:SciPy-Dev@python.org> https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net <https://www.romainjacob.net/> @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22
On Mon, Aug 17, 2020 at 9:44 AM Romain Jacob <jacobr@ethz.ch> wrote:
Hello everyone,
I've submitted the PR adding support for non-parametric confidence intervals for quantiles (https://github.com/scipy/scipy/pull/12680). There has been quite some comments made already, which I fixed appropriately I believe
Will be happy to get some more feedback or see the PR merged :-)
Note: the last commit has a CI failing apparently due to a file change in `scipy/sparse/linalg/` which is completely unrelated. I'm not sure how to go about this... ?
If it's clearly unrelated, you can just ignore it. Or add a comment "the only CI failure is in sparse.linalg and unrelated to this PR". Then the reviewer can just go ahead and merge if everything else looks good - CI doesn't have to be green. Cheers, Ralf Cheers,
-- Romain
On 15/06/2020 08:27, Romain Jacob wrote:
On 13/06/2020 20:54, josef.pktd@gmail.com wrote:
On Fri, Jun 12, 2020 at 11:29 AM <josef.pktd@gmail.com> wrote:
On Fri, Jun 12, 2020 at 1:58 AM Romain Jacob <jacobr@ethz.ch> wrote:
On 11/06/2020 20:54, Warren Weckesser wrote:
On 6/11/20, josef.pktd@gmail.com <josef.pktd@gmail.com> <josef.pktd@gmail.com> wrote:
I think it would make a good and useful addition and fit into scipy.stats. There are no pure confint functions yet, AFAIR.
I agree with Josef and Matt, this looks like it would be a nice addition to SciPy. At the moment, I'm not sure what the API should look like. Romain, is the work that you've already done available online somewhere?
Warren
Yes, I have some functional implementation available here: https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L397
An implementation detail: binom has cdf and ppf functions My guess, not verified, is that we can just use binom.interval
(at least I used those for similar cases)
I found my version again
https://github.com/statsmodels/statsmodels/issues/6562#issuecomment-59276948...
I guess that's the same for two sided confint as the references. It doesn't have interpolation if that could be applied in this case.
I don't entirely follow what you mean here: that the building of the probabilities in these two lines( https://github.com/TriScale-Anon/triscale/blob/master/helpers.py#L438 and L439) can be built directly form binom without np.cumsum? That definitely correct (I actually have code also doing that somewhere).
I did not know about the `interval` method. That's sound interesting indeed, but it's not 100% clear to me how the uniqueness problem is handled. I looked for the implementation of the method but couldn't find it in `binom`... I'm looking in the wrong place?
Cheers, -- Romain
This will eventually end up in statsmodels, but I don't know yet where. That's not a reason not to add it to scipy.stats.
Josef
Josef
There is quite some work to be done on formatting and documentation to comply with the SciPy standards, but functionally it's already there (and as you'll see, the method is quite simple).
Cheers, -- Romain
I recently wrote a function for the confidence interval for the median, mainly because I ran into the formulas that were easy to code. related open issue: how do we get confidence intervals for QQ-plot.
aside: I don't like "percent", I prefer quantiles in [0, 1]. See discussion a while ago in numpy.
Josef
On Thu, Jun 11, 2020 at 1:01 PM Matt Haberland <mhaberla@calpoly.edu> <mhaberla@calpoly.edu> wrote:
OK, we should let our statistics experts weigh in on this. (I'm not actually one of them.)
On Wed, Jun 10, 2020 at 10:46 PM Romain Jacob <jacobr@ethz.ch> <jacobr@ethz.ch> wrote:
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals.
I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function. On 10/06/2020 20:38, Matt Haberland wrote:
Where do you envision this living in SciPy? In its own function, or added functionality to other functions e.g. scipy.stats.percentileofscore<https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> <https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.percentileo...> ?
On Tue, Jun 9, 2020 at 11:12 PM Romain Jacob <jacobr@ethz.ch> <jacobr@ethz.ch> wrote:
On 09/06/2020 20:18, Matt Haberland wrote:
Yes, I think we would be interested in confidence intervals, but I think the algorithm should be very well standard/cited, even if it's not the best/most modern.
Yes definitely! We did not invented the method I am referring to, it a long-known approach (first proposed by Thompson in 1936 [1], extended later and commonly found in textbooks, eg [2,3]). This method is very simple, quite powerful, yet it has been largely overlooked in many scientific fields. I found no available implementation to facilitate its use (at least not in Python, there may be something in R, I have not looked).
[1] https://www.jstor.org/stable/2957563 [2] doi.org/10.1002/0471722162.ch7 [3] https://perfeval.epfl.ch/
@WarrenWeckesser and I had planned to work on confidence intervals for the test statistics returned by our statistical tests<https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests> <https://docs.scipy.org/doc/scipy/reference/stats.html#statistical-tests>.
That is also definitely interesting, although I am not myself an expert in that area. I am glad to see that the complete list contains some non-parametric tests :-)
Cheers, -- Romain
On Mon, Jun 8, 2020 at 2:11 AM Romain Jacob <jacobr@ethz.ch> <jacobr@ethz.ch> wrote:
Hello everyone,
I have been working for some time on the implementation of non-parametric methods to compute confidence intervals for percentiles. There are some very interesting results in the literature (see e.g. a nice pitch in [1]) which I think it would be great to add to SciPy to make them more readily available. It also seems to be rather in line with "recent" discussions of the roadmap for scipy.stats [2].
I would be interested in contributing this. What do you think?
Cheers, -- Romain
[1] https://ieeexplore.ieee.org/document/6841797 [2] https://github.com/scipy/scipy/issues/10577 -- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratorywww.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailinglistSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratorywww.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly
_______________________________________________ SciPy-Dev mailinglistSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Matt Haberland Assistant Professor BioResource and Agricultural Engineering 08A-3K, Cal Poly _______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22
_______________________________________________ SciPy-Dev mailing listSciPy-Dev@python.orghttps://mail.python.org/mailman/listinfo/scipy-dev
-- Romain Jacob Postdoctoral Researcher ETH Zurich - Computer Engineering and Networks Laboratory www.romainjacob.net @RJacobPartner <https://twitter.com/RJacobPartner> Gloriastrasse 35, ETZ G75 8092 Zurich +41 7 68 16 88 22 _______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
Dear all,
On 11. Jun 2020, at 07:46, Romain Jacob <jacobr@ethz.ch> wrote:
I think a dedicated function makes more sense. This function takes as input an array, a percentile and a confidence level, and returns the corresponding one-sided confidence intervals.
I quickly looked at the list of existing functions in scipy.stats but did not see any function in "summary statistics" that does similar things. So I would go for a new function.
I just joined the list, so I apologise for any etiquette-breaking in advance, but would like to inject here that I am collaborating with Daniel Saxton on `resample`, a library that implements the jackknife and bootstrap, which can be used - among many other things - to compute confidence intervals for quantiles/percentiles. https://github.com/dsaxton/resample We are currently working on interface and documentation and adding more unit tests and benchmarks, but `resample` is already the most complete library that implements resampling methods in Python. Seeing that https://github.com/scipy/scipy/issues/10577 explicitly mentions bootstrapping, we are interested in merging our work into scipy. We use the BSD 3-clause license, so the license should not be an issue. Is there already work ongoing on bootstrap methods? With whom should we collaborate? Some context about us: Daniel is a data analyst working in the financial industry. I am a particle physicist and the author of Boost Histogram (C++ and Python, https://github.com/boostorg/histogram, https://github.com/scikit-hep/boost-histogram) and the maintainer of iminuit, the general purpose minimiser and error computer (C++ and Python, https://github.com/scikit-hep/iminuit). Best regards, Hans
Dear all, since there was no reply to my first attempt, I am repeat my message. Daniel Saxton and I are working on a Python library called `resample`, which implements the bootstrap and jackknife. We would like to work toward merging bootstrap functions into Scipy and it would be great to get some feedback about this. We would be pleased to collaborate with people who are already working on this in Scipy. We are both pretty decent programmers, knowledgable about statistics in general and the bootstrap in particular. Best regards, Hans
On 12. Jun 2020, at 16:16, Hans Dembinski <hans.dembinski@gmail.com> wrote:
I just joined the list, so I apologise for any etiquette-breaking in advance, but would like to inject here that I am collaborating with Daniel Saxton on `resample`, a library that implements the jackknife and bootstrap, which can be used - among many other things - to compute confidence intervals for quantiles/percentiles.
https://github.com/dsaxton/resample
We are currently working on interface and documentation and adding more unit tests and benchmarks, but `resample` is already the most complete library that implements resampling methods in Python.
Seeing that https://github.com/scipy/scipy/issues/10577 explicitly mentions bootstrapping, we are interested in merging our work into scipy. We use the BSD 3-clause license, so the license should not be an issue. Is there already work ongoing on bootstrap methods? With whom should we collaborate?
Some context about us:
Daniel is a data analyst working in the financial industry. I am a particle physicist and the author of Boost Histogram (C++ and Python, https://github.com/boostorg/histogram, https://github.com/scikit-hep/boost-histogram) and the maintainer of iminuit, the general purpose minimiser and error computer (C++ and Python, https://github.com/scikit-hep/iminuit).
Best regards, Hans
On 6/18/20, Hans Dembinski <hans.dembinski@gmail.com> wrote:
Dear all,
since there was no reply to my first attempt, I am repeat my message. Daniel Saxton and I are working on a Python library called `resample`, which implements the bootstrap and jackknife. We would like to work toward merging bootstrap functions into Scipy and it would be great to get some feedback about this. We would be pleased to collaborate with people who are already working on this in Scipy. We are both pretty decent programmers, knowledgable about statistics in general and the bootstrap in particular.
Thanks, Hans. We would be very interested in adding bootstrap methods to SciPy! I might not get to it for a few days, but I'll take a look at your library and see if it makes sense to incorporate it into SciPy. If anyone other SciPy devs can get to it sooner, please take a look! Warren
Best regards, Hans
On 12. Jun 2020, at 16:16, Hans Dembinski <hans.dembinski@gmail.com> wrote:
I just joined the list, so I apologise for any etiquette-breaking in advance, but would like to inject here that I am collaborating with Daniel Saxton on `resample`, a library that implements the jackknife and bootstrap, which can be used - among many other things - to compute confidence intervals for quantiles/percentiles.
https://github.com/dsaxton/resample
We are currently working on interface and documentation and adding more unit tests and benchmarks, but `resample` is already the most complete library that implements resampling methods in Python.
Seeing that https://github.com/scipy/scipy/issues/10577 explicitly mentions bootstrapping, we are interested in merging our work into scipy. We use the BSD 3-clause license, so the license should not be an issue. Is there already work ongoing on bootstrap methods? With whom should we collaborate?
Some context about us:
Daniel is a data analyst working in the financial industry. I am a particle physicist and the author of Boost Histogram (C++ and Python, https://github.com/boostorg/histogram, https://github.com/scikit-hep/boost-histogram) and the maintainer of iminuit, the general purpose minimiser and error computer (C++ and Python, https://github.com/scikit-hep/iminuit).
Best regards, Hans
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
Dear Warren, (Daniel in CC)
On 18. Jun 2020, at 16:15, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On 6/18/20, Hans Dembinski <hans.dembinski@gmail.com> wrote:
Dear all,
since there was no reply to my first attempt, I am repeat my message. Daniel Saxton and I are working on a Python library called `resample`, which implements the bootstrap and jackknife. We would like to work toward merging bootstrap functions into Scipy and it would be great to get some feedback about this. We would be pleased to collaborate with people who are already working on this in Scipy. We are both pretty decent programmers, knowledgable about statistics in general and the bootstrap in particular.
Thanks, Hans. We would be very interested in adding bootstrap methods to SciPy!
I might not get to it for a few days, but I'll take a look at your library and see if it makes sense to incorporate it into SciPy. If anyone other SciPy devs can get to it sooner, please take a look!
that is excellent, thanks! The basic functionality is all there. We are currently working on refining the interface, the docs need more work, and we want to add more unit tests. Currently, the project is not at a quality-level fit for SciPy, but I am sure we can get there. Best regards, Hans
Dear Warren, all, I am following up on my message from June about integrating a general bootstrap library into scipy. Daniel and I have been busy with finishing our rewrite of the resample library and we released version 1.0.1 for general use on August 24. I have been busy with other stuff that's why I didn't come back sooner, sorry. Docs: https://resample.readthedocs.io/en/master Source: https://github.com/resample-project/resample PyPI: https://pypi.org/project/resample resample is a pure Python implementation written from scratch using only scipy and numpy as dependencies and a BSD 3-clause license. It should be suitable for inclusion in scipy. I believe we have converged on a high quality Pythonic interface that offers both a powerful low-level API for experts and a convenient high-level API for practitioners. Our implementations were optimised to make efficient use of numpy to offload the hot loops into C and to avoid creation of unnecessary copies and temporary arrays. What resample offers: - Ordinary, balanced, and parametric bootstrap resampling with stratification of N-dimensional data - Jackknife resampling of N-dimensional data - For both bootstrap and jackknife resampling: computation of bias and/or variance of an estimator (that would be a generic Python function which maps data samples to N-dimensional output) - Bootstrap confidence intervals (BCa and percentile) - A battery of non-parametric permutation-based tests like the Wilcoxon rank sum test, https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test - Accessive docs in numpydoc format The bootstrap and jackknife functionalities are completely generic. One can compute confidence intervals (the BCa method is state-of-the-art) for any statistical estimator, including arbitrary complicated ones obtained from machine learning and also for the quantile which has the original topic of this thread. So far we only have 34 stars on Github, but that is mostly because we did not advertise. I believe our library has the potential to be very popular if we actually start advertising, but neither myself nor Daniel are very interested doing public relations. We both have full-time jobs and developing resample is a hobby to us. We would be happy to have resample in SciPy so that our work can benefit from the visibility that Scipy enjoys, while Scipy can benefit from the functionality that resample offers. Best regards, Hans PS: My credentials in case you need them: I program in Python since 15 years as a scientist working on big data. I have expertise in both user-friendly interface design and hardware-near numerical programming. I am the author of the Boost.Histogram library in C++14 on www.boost.org and co-author of the corresponding Python module boost-histogram. I contributed to matplotlib and maintain the iminuit Python module, a numerical minimiser and error computation tool that is popular in high energy physics. PPS: Last week, I had the opportunity to listen live to a talk from Brad Efron himself, the inventor of the bootstrap. Fantastic guy.
On 18. Jun 2020, at 17:53, Hans Dembinski <hans.dembinski@gmail.com> wrote:
Dear Warren, (Daniel in CC)
On 18. Jun 2020, at 16:15, Warren Weckesser <warren.weckesser@gmail.com> wrote:
On 6/18/20, Hans Dembinski <hans.dembinski@gmail.com> wrote:
Dear all,
since there was no reply to my first attempt, I am repeat my message. Daniel Saxton and I are working on a Python library called `resample`, which implements the bootstrap and jackknife. We would like to work toward merging bootstrap functions into Scipy and it would be great to get some feedback about this. We would be pleased to collaborate with people who are already working on this in Scipy. We are both pretty decent programmers, knowledgable about statistics in general and the bootstrap in particular.
Thanks, Hans. We would be very interested in adding bootstrap methods to SciPy!
I might not get to it for a few days, but I'll take a look at your library and see if it makes sense to incorporate it into SciPy. If anyone other SciPy devs can get to it sooner, please take a look!
that is excellent, thanks! The basic functionality is all there. We are currently working on refining the interface, the docs need more work, and we want to add more unit tests. Currently, the project is not at a quality-level fit for SciPy, but I am sure we can get there.
Best regards, Hans
participants (6)
-
Hans Dembinski -
josef.pktd@gmail.com -
Matt Haberland -
Ralf Gommers -
Romain Jacob -
Warren Weckesser