Statistics review months
In the interest of improving the quality of the scipy.stats package, I hereby declare April and May of 2006 to be Statistics Review Months. I propose that we set ourselves a goal to review each function in stats.py and morestats.py (and a few others) for correctness and completeness of implementation by the end of May. By my count, that's about 2.5 functions every day. Surely this is a reasonable amount of effort for a rather large payoff: a robust, well-tested and thorough statistics library. I have added a Wiki page describing the details: http://projects.scipy.org/scipy/scipy/wiki/StatisticsReview Barring any objections, I will be irretrievably creating the ~150 tickets or so for all of the functions to be reviewed later tonight. So if you object, act fast! [Disclosure: this idea isn't mine. Eric Jones mentioned it to me once, and I'm just running with it.] -- Robert Kern robert.kern@gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert,
In the interest of improving the quality of the scipy.stats package, I hereby declare April and May of 2006 to be Statistics Review Months. I propose that we set ourselves a goal to review each function in stats.py and morestats.py (and a few others) for correctness and completeness of implementation by the end of
What a great idea! We had just started working on this ourselves. We're going to migrate some of our internal wiki content over to the trac pages over the next few days, regarding rv_continuous and rv_discrete. In particular, the distributions implementation is both very well-designed and very bug-ridden. While we're doing this level of refactoring, perhaps more attention could be given to fernando's "scikit" idea? If we're going to invest this much time in a module, perhaps it would be a good time to consider cleaning up the interface, and even writing a good how-to guide, ala the matlab toolbox users guides... ...Eric Jonas
Eric Jonas wrote:
Robert,
In the interest of improving the quality of the scipy.stats package, I hereby declare April and May of 2006 to be Statistics Review Months. I propose that we set ourselves a goal to review each function in stats.py and morestats.py (and a few others) for correctness and completeness of implementation by the end of
What a great idea! We had just started working on this ourselves. We're going to migrate some of our internal wiki content over to the trac pages over the next few days, regarding rv_continuous and rv_discrete. In particular, the distributions implementation is both very well-designed and very bug-ridden.
Yes. Given the number of functions in stats.py and the number of distributions (multiplied by the number of methods each distribution has), I thought it best to focus on the functions this time around. But please do add the Wiki pages so we can keep track of this! If this procedure works well, I'm sure we will do a Distributions Review Month soon.
While we're doing this level of refactoring, perhaps more attention could be given to fernando's "scikit" idea? If we're going to invest this much time in a module, perhaps it would be a good time to consider cleaning up the interface, and even writing a good how-to guide, ala the matlab toolbox users guides...
Good point. I will add a Wiki page where reviewers can add examples and HOWTO text for each function as they work their way down the list. When the reviewing settles down, that would be the perfect time for some reorganization and editing work to form a coherent user's guide for scipy.stats. -- Robert Kern robert.kern@gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
on this topic, as an honest-to-goodness statistician it might be nice to see more statistical modelling in scipy. i know Rpy exists, but the interface is not very pythonic. i have some "home-brew" modules for linear regression, formula building (something like R's) and a few other things. if it went into something like scipy, it might gain from the criticisms of others.... is there any interest in making the equivalent of a scipy.stats.models module? i think an easily (medium-term) achievable goal is: i) linear (least-squares) regression models with/without weights or non-diagonal covariance matrices (in R: lm + more) ii) generalized linear models (in R: glm) iii) iteratively reweighted least squares algorithms (glm is a special case), i.e. robust regression (in R: rlm). iv) ordinary least squares multivariate linear models (i.e. multivariate responses) some of these models can easily be "broadcasted", others not so easily.... further goals are more general models: classification, constrained model fitting, model selection.... for some of these things, it may not be worth duplicating R's (or other packages') efforts. -- jonathan Robert Kern wrote:
In the interest of improving the quality of the scipy.stats package, I hereby declare April and May of 2006 to be Statistics Review Months. I propose that we set ourselves a goal to review each function in stats.py and morestats.py (and a few others) for correctness and completeness of implementation by the end of May. By my count, that's about 2.5 functions every day. Surely this is a reasonable amount of effort for a rather large payoff: a robust, well-tested and thorough statistics library.
I have added a Wiki page describing the details:
http://projects.scipy.org/scipy/scipy/wiki/StatisticsReview
Barring any objections, I will be irretrievably creating the ~150 tickets or so for all of the functions to be reviewed later tonight. So if you object, act fast!
[Disclosure: this idea isn't mine. Eric Jones mentioned it to me once, and I'm just running with it.]
-- ------------------------------------------------------------------------ I'm part of the Team in Training: please support our efforts for the Leukemia and Lymphoma Society! http://www.active.com/donate/tntsvmb/tntsvmbJTaylor GO TEAM !!! ------------------------------------------------------------------------ Jonathan Taylor Tel: 650.723.9230 Dept. of Statistics Fax: 650.725.8977 Sequoia Hall, 137 www-stat.stanford.edu/~jtaylo 390 Serra Mall Stanford, CA 94305
I think this would be a useful addition. Jonathan Taylor wrote:
on this topic, as an honest-to-goodness statistician it might be nice to see more statistical modelling in scipy. i know Rpy exists, but the interface is not very pythonic.
i have some "home-brew" modules for linear regression, formula building (something like R's) and a few other things. if it went into something like scipy, it might gain from the criticisms of others....
is there any interest in making the equivalent of a
scipy.stats.models
module?
i think an easily (medium-term) achievable goal is:
i) linear (least-squares) regression models with/without weights or non-diagonal covariance matrices (in R: lm + more)
ii) generalized linear models (in R: glm)
iii) iteratively reweighted least squares algorithms (glm is a special case), i.e. robust regression (in R: rlm).
iv) ordinary least squares multivariate linear models (i.e. multivariate responses)
some of these models can easily be "broadcasted", others not so easily....
further goals are more general models: classification, constrained model fitting, model selection.... for some of these things, it may not be worth duplicating R's (or other packages') efforts.
-- jonathan
Robert Kern wrote:
In the interest of improving the quality of the scipy.stats package, I hereby declare April and May of 2006 to be Statistics Review Months. I propose that we set ourselves a goal to review each function in stats.py and morestats.py (and a few others) for correctness and completeness of implementation by the end of May. By my count, that's about 2.5 functions every day. Surely this is a reasonable amount of effort for a rather large payoff: a robust, well-tested and thorough statistics library.
I have added a Wiki page describing the details:
http://projects.scipy.org/scipy/scipy/wiki/StatisticsReview
Barring any objections, I will be irretrievably creating the ~150 tickets or so for all of the functions to be reviewed later tonight. So if you object, act fast!
[Disclosure: this idea isn't mine. Eric Jones mentioned it to me once, and I'm just running with it.]
-- Steven H. Rogers, Ph.D., steve@shrogers.com Weblog: http://shrogers.com/weblog "He who refuses to do arithmetic is doomed to talk nonsense." -- John McCarthy
participants (4)
-
Eric Jonas
-
Jonathan Taylor
-
Robert Kern
-
Steven H. Rogers