
Hello everyone, A few years ago I implemented a scikit for bootstrap confidence limits (https://github.com/cgevans/scikits-bootstrap). I didn’t think much about it after that until recently, when I realized that some people are actually using it, and that there’s apparently been some talk about implementing this functionality in either scipy.stats or statsmodels (I should thank Randal Olson for discussing this and bringing it to my attention). As such I’ve rewritten most of the code, and written up some docstrings. The current code can do confidence intervals with basic percentile interval, bias-corrected accelerated, and approximate bootstrap confidence methods, and can also provide bootstrap and jackknife indexes. Most of it is implemented from the descriptions in Efron and Tibshirani’s Introduction to the Bootstrap, but the ABC code at the moment is a port from the modified-BSD-licensed bootstrap package for R (not the boot package) as I’m not entirely confident in my understanding of the method. And so, I have a few questions for everyone: * Is there any interest in including this sort of code in either scipy.stats or statsmodels? If so, where do people think would be the better place? The code is relatively small; at the moment it is less than 200 lines, with docstrings probably making up 100 of those lines. * Also, if so, what would need to be changed, added, and improved beyond what is mentioned in the Contributing to Scipy part of the reference guide? I’m never a fan of my own code, and imagine quite a bit would need to be fixed; I know tests will need to be added too. In addition, I have a few questions about what would be better practice for the API, and I haven’t really found a guide on best practices for Scipy: * When I started writing the code, I wrote a single function ci for confidence intervals, with a method argument to choose the method. This is easy for users, especially so that they don’t have to look through documentation to realize that BCA is the most generally useful method (at least from everything I’ve read) and that there really isn’t any reason to use many of the simpler methods. However, ABC takes different paramenters, and needs a statistic function that takes weights, which makes this single-function organization trickier. At the moment, I have a separate function for ABC. Would it be better to split up all the methods to their own functions? * ABC requires a statistic function that takes weights. I’ve noticed that things like np.average takes a weights= argument. Would it be better to require input of a stat(data,weights) function, or input of a stat(data,weights=) with weights as a named argument? The latter would be nice in terms of allowing the same function to be used for all methods, but would make it impossible to use a lambda for the function. Is there some other method of doing this entirely? * Are there any missing features that anyone thinks should be added? I apologize if much of this is answered elsewhere, I just haven’t found any of it; I also apologize if this is far too long-winded and confusing! Regards, Constantine Evans

On Wed, Aug 8, 2012 at 2:38 PM, Constantine Evans <cevans@evanslabs.org>wrote:
Hello everyone,
Hi,
A few years ago I implemented a scikit for bootstrap confidence limits (https://github.com/cgevans/scikits-bootstrap). I didn’t think much about it after that until recently, when I realized that some people are actually using it, and that there’s apparently been some talk about implementing this functionality in either scipy.stats or statsmodels (I should thank Randal Olson for discussing this and bringing it to my attention).
As such I’ve rewritten most of the code, and written up some docstrings. The current code can do confidence intervals with basic percentile interval, bias-corrected accelerated, and approximate bootstrap confidence methods, and can also provide bootstrap and jackknife indexes. Most of it is implemented from the descriptions in Efron and Tibshirani’s Introduction to the Bootstrap, but the ABC code at the moment is a port from the modified-BSD-licensed bootstrap package for R (not the boot package) as I’m not entirely confident in my understanding of the method.
And so, I have a few questions for everyone:
* Is there any interest in including this sort of code in either scipy.stats or statsmodels? If so, where do people think would be the better place? The code is relatively small; at the moment it is less than 200 lines, with docstrings probably making up 100 of those lines.
I think it would be great to have this in statsmodels. I filed an enhancement ticket about it this morning (also brought to my attention by Randy's blog post). https://github.com/statsmodels/statsmodels/issues/420
* Also, if so, what would need to be changed, added, and improved beyond what is mentioned in the Contributing to Scipy part of the reference guide? I’m never a fan of my own code, and imagine quite a bit would need to be fixed; I know tests will need to be added too.
We can discuss further on the statsmodels mailing list (cc'd) unless someone feels strongly that this should go into scipy. I'm not sure about API yet so that it can be general and used across all the models in statsmodels. It's one of the reasons I've put off incorporating code like this for so long.
In addition, I have a few questions about what would be better practice for the API, and I haven’t really found a guide on best practices for Scipy:
* When I started writing the code, I wrote a single function ci for confidence intervals, with a method argument to choose the method. This is easy for users, especially so that they don’t have to look through documentation to realize that BCA is the most generally useful method (at least from everything I’ve read) and that there really isn’t any reason to use many of the simpler methods. However, ABC takes different paramenters, and needs a statistic function that takes weights, which makes this single-function organization trickier. At the moment, I have a separate function for ABC. Would it be better to split up all the methods to their own functions?
I think this might be preferable.
* ABC requires a statistic function that takes weights. I’ve noticed that things like np.average takes a weights= argument. Would it be better to require input of a stat(data,weights) function, or input of a stat(data,weights=) with weights as a named argument? The latter would be nice in terms of allowing the same function to be used for all methods, but would make it impossible to use a lambda for the function. Is there some other method of doing this entirely? * Are there any missing features that anyone thinks should be added?
I apologize if much of this is answered elsewhere, I just haven’t found any of it; I also apologize if this is far too long-winded and confusing!
Regards, Constantine Evans _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

Replying to Skipper's message to get the Statsmodels folks...
On Wed, Aug 8, 2012 at 2:38 PM, Constantine Evans <cevans@evanslabs.org> wrote:
Hello everyone,
On Wed, Aug 8, 2012 at 11:57 AM, Skipper Seabold <jsseabold@gmail.com> wrote:
Hi,
A few years ago I implemented a scikit for bootstrap confidence limits (https://github.com/cgevans/scikits-bootstrap). I didn’t think much about it after that until recently, when I realized that some people are actually using it, and that there’s apparently been some talk about implementing this functionality in either scipy.stats or statsmodels (I should thank Randal Olson for discussing this and bringing it to my attention).
As such I’ve rewritten most of the code, and written up some docstrings. The current code can do confidence intervals with basic percentile interval, bias-corrected accelerated, and approximate bootstrap confidence methods, and can also provide bootstrap and jackknife indexes. Most of it is implemented from the descriptions in Efron and Tibshirani’s Introduction to the Bootstrap, but the ABC code at the moment is a port from the modified-BSD-licensed bootstrap package for R (not the boot package) as I’m not entirely confident in my understanding of the method.
I can't comment on the ABC method, but your BCA method appears to be consistent with my own implementation.
And so, I have a few questions for everyone:
* Is there any interest in including this sort of code in either scipy.stats or statsmodels? If so, where do people think would be the better place? The code is relatively small; at the moment it is less than 200 lines, with docstrings probably making up 100 of those lines.
I think it would be great to have this in statsmodels. I filed an enhancement ticket about it this morning (also brought to my attention by Randy's blog post).
As a user, I would also love to see this in statsmodels
* Also, if so, what would need to be changed, added, and improved beyond what is mentioned in the Contributing to Scipy part of the reference guide? I’m never a fan of my own code, and imagine quite a bit would need to be fixed; I know tests will need to be added too.
I can only speak to the BCA method, but I propose the following when you compute the acceleration: https://gist.github.com/3307341 Everyone's data is different and probably 99.99% of the time, SCD won't turn out to be 0 and raise a ZeroDivision error, but it happened to me and that's how I fixed it. Just a thought. Cheers, -paul

Discovered this thread after searching for statistical bootstrap in Python. It would be nice to have the capability to see a general bootstrap either in statsmodels or scipy.stats. Are people interested in this? Here's an object oriented approach that I took for it- https://github.com/clarkfitzg/stat-bootstrap/blob/master/bootstrap.py Regards, Clark Fitzgerald On Wednesday, August 8, 2012 11:57:20 AM UTC-7, jseabold wrote:
On Wed, Aug 8, 2012 at 2:38 PM, Constantine Evans <cev...@evanslabs.org <javascript:>> wrote:
Hello everyone,
Hi,
A few years ago I implemented a scikit for bootstrap confidence limits (https://github.com/cgevans/scikits-bootstrap). I didn’t think much about it after that until recently, when I realized that some people are actually using it, and that there’s apparently been some talk about implementing this functionality in either scipy.stats or statsmodels (I should thank Randal Olson for discussing this and bringing it to my attention).
As such I’ve rewritten most of the code, and written up some docstrings. The current code can do confidence intervals with basic percentile interval, bias-corrected accelerated, and approximate bootstrap confidence methods, and can also provide bootstrap and jackknife indexes. Most of it is implemented from the descriptions in Efron and Tibshirani’s Introduction to the Bootstrap, but the ABC code at the moment is a port from the modified-BSD-licensed bootstrap package for R (not the boot package) as I’m not entirely confident in my understanding of the method.
And so, I have a few questions for everyone:
* Is there any interest in including this sort of code in either scipy.stats or statsmodels? If so, where do people think would be the better place? The code is relatively small; at the moment it is less than 200 lines, with docstrings probably making up 100 of those lines.
I think it would be great to have this in statsmodels. I filed an enhancement ticket about it this morning (also brought to my attention by Randy's blog post).
https://github.com/statsmodels/statsmodels/issues/420
* Also, if so, what would need to be changed, added, and improved beyond what is mentioned in the Contributing to Scipy part of the reference guide? I’m never a fan of my own code, and imagine quite a bit would need to be fixed; I know tests will need to be added too.
We can discuss further on the statsmodels mailing list (cc'd) unless someone feels strongly that this should go into scipy. I'm not sure about API yet so that it can be general and used across all the models in statsmodels. It's one of the reasons I've put off incorporating code like this for so long.
In addition, I have a few questions about what would be better practice for the API, and I haven’t really found a guide on best practices for Scipy:
* When I started writing the code, I wrote a single function ci for confidence intervals, with a method argument to choose the method. This is easy for users, especially so that they don’t have to look through documentation to realize that BCA is the most generally useful method (at least from everything I’ve read) and that there really isn’t any reason to use many of the simpler methods. However, ABC takes different paramenters, and needs a statistic function that takes weights, which makes this single-function organization trickier. At the moment, I have a separate function for ABC. Would it be better to split up all the methods to their own functions?
I think this might be preferable.
* ABC requires a statistic function that takes weights. I’ve noticed that things like np.average takes a weights= argument. Would it be better to require input of a stat(data,weights) function, or input of a stat(data,weights=) with weights as a named argument? The latter would be nice in terms of allowing the same function to be used for all methods, but would make it impossible to use a lambda for the function. Is there some other method of doing this entirely? * Are there any missing features that anyone thinks should be added?
I apologize if much of this is answered elsewhere, I just haven’t found any of it; I also apologize if this is far too long-winded and confusing!
Regards, Constantine Evans _______________________________________________ SciPy-Dev mailing list SciP...@scipy.org <javascript:> http://mail.scipy.org/mailman/listinfo/scipy-dev
participants (4)
-
Clark Fitzgerald
-
Constantine Evans
-
Paul Hobson
-
Skipper Seabold