
Just curious: what are the plans for the "Statistic Review" tickets in scipy.stats, from April, 2006? Can any of these be closed? Warren

On Wed, Mar 31, 2010 at 1:15 AM, Warren Weckesser <warren.weckesser@enthought.com> wrote:
Just curious: what are the plans for the "Statistic Review" tickets in scipy.stats, from April, 2006? Can any of these be closed?
short answer: I looked at them and there are several that can be closed, especially the functions that have been removed or depreciated. For others, I have to check my notes to see which ones I verified and added tested. long answer (this were my initial notes when I browsed the tickets): I found them very inconvenient to work with. Many or most of them are empty and I didn't find an easy way to get an overview which ones of the tickets contain useful information and require attention. I went through them randomly and over those that contain an attachment, but never systematically. I prefer to open new tickets for specific issues because they are easier to find. For a status overview I keep my own list. I changed, closed or reassigned only a few of them. The owner is still Robert. trying out some options, I found this view http://projects.scipy.org/scipy/query?status=accepted&status=apply&status=needs_decision&status=needs_review&status=needs_work&status=new&status=reopened&order=changetime&col=id&col=summary&col=owner&col=type&col=status&col=priority&col=changetime&milestone=StatisticsCleanup&desc=1 which sorts tickets by changetime. Tickets up to #74 in the list have some additional comments. Is there a way to "Show under each result: Description", but instead of Description, I would like to get also the comments? For the regular use of the trac tickets they don't create much noise because the statistical review tickets don't count as open tickets. Josef
Warren
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Wed, Mar 31, 2010 at 11:32, <josef.pktd@gmail.com> wrote:
On Wed, Mar 31, 2010 at 1:15 AM, Warren Weckesser <warren.weckesser@enthought.com> wrote:
Just curious: what are the plans for the "Statistic Review" tickets in scipy.stats, from April, 2006? Can any of these be closed?
short answer: I looked at them and there are several that can be closed, especially the functions that have been removed or depreciated. For others, I have to check my notes to see which ones I verified and added tested.
long answer (this were my initial notes when I browsed the tickets):
I found them very inconvenient to work with. Many or most of them are empty and I didn't find an easy way to get an overview which ones of the tickets contain useful information and require attention.
Well, the point of them was to systematically review every function. Those "empty" tickets need just as much attention as those that have comments. However, it's obvious that I dropped the ball on organizing that effort. They may be systematically closed now. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 03/31/2010 12:57 PM, Robert Kern wrote:
On Wed, Mar 31, 2010 at 11:32,<josef.pktd@gmail.com> wrote:
On Wed, Mar 31, 2010 at 1:15 AM, Warren Weckesser <warren.weckesser@enthought.com> wrote:
Just curious: what are the plans for the "Statistic Review" tickets in scipy.stats, from April, 2006? Can any of these be closed?
short answer: I looked at them and there are several that can be closed, especially the functions that have been removed or depreciated. For others, I have to check my notes to see which ones I verified and added tested.
long answer (this were my initial notes when I browsed the tickets):
I found them very inconvenient to work with. Many or most of them are empty and I didn't find an easy way to get an overview which ones of the tickets contain useful information and require attention.
Well, the point of them was to systematically review every function. Those "empty" tickets need just as much attention as those that have comments.
However, it's obvious that I dropped the ball on organizing that effort. They may be systematically closed now.
Really these functions do need to be addressed but it is rather daunting task to go through all these functions especially for the assigned criteria and given the 'duplicated' masked array functions. There are about 187 functions involved in stats modules - although some are deprecated and 'redefinition' of existing functions. Some of the functions are more utilities than stats functions and some functions are very dimension specific. For example, there is a tmean(a, limits=(min,max)) function that is essentially doing "a.compress((a>min) & (a<max)).mean()" because there is no axis option. Then there is a masked array function, trimmed_mean, that does have an axis argument. However, I came to the conclusion that most of these have more problems than they are worth making my original idea worthless. So it would be better to have a clear plan for a proper set of statistical functions than just 'blindly fixing' the existing functions. Bruce

On Thu, Apr 1, 2010 at 9:51 AM, Bruce Southey <bsouthey@gmail.com> wrote:
On 03/31/2010 12:57 PM, Robert Kern wrote:
On Wed, Mar 31, 2010 at 11:32,<josef.pktd@gmail.com> wrote:
On Wed, Mar 31, 2010 at 1:15 AM, Warren Weckesser <warren.weckesser@enthought.com> wrote:
Just curious: what are the plans for the "Statistic Review" tickets in scipy.stats, from April, 2006? Can any of these be closed?
short answer: I looked at them and there are several that can be closed, especially the functions that have been removed or depreciated. For others, I have to check my notes to see which ones I verified and added tested.
long answer (this were my initial notes when I browsed the tickets):
I found them very inconvenient to work with. Many or most of them are empty and I didn't find an easy way to get an overview which ones of the tickets contain useful information and require attention.
Well, the point of them was to systematically review every function. Those "empty" tickets need just as much attention as those that have comments.
However, it's obvious that I dropped the ball on organizing that effort. They may be systematically closed now.
Really these functions do need to be addressed but it is rather daunting task to go through all these functions especially for the assigned criteria and given the 'duplicated' masked array functions. There are about 187 functions involved in stats modules - although some are deprecated and 'redefinition' of existing functions. Some of the functions are more utilities than stats functions and some functions are very dimension specific. For example, there is a tmean(a, limits=(min,max)) function that is essentially doing "a.compress((a>min) & (a<max)).mean()" because there is no axis option. masked array function, trimmed_mean, that does have an axis argument.
However, I came to the conclusion that most of these have more problems than they are worth making my original idea worthless. So it would be better to have a clear plan for a proper set of statistical functions than just 'blindly fixing' the existing functions.
Because there are so many functions, because test coverage is still not sufficient and because of backwards compatibility, I still prefer an incremental approach. The trim functions are mostly convenience functions according to Gary, and since I haven't found yet a use for them, I mostly ignored them, except for adding some tests. The statistical tests are (almost?) completely verified and produce the advertised results. But in morestats.py there are still groups of functions, e.g. boxcox, kstat, ppcc that look ok but I'm not sure about them, and functions that I would like to (temporarily) delete or replace ('pdf_moments', 'pdf_fromgamma', 'pdfapprox') because the are incorrect. Overall, I think of scipy.stats a bit of a laundry list and it's still difficult to see what "a clear plan for a proper set of statistical functions than just 'blindly fixing' the existing functions" would be. My current opinion is that a lot of the more elaborate statistical functions should be classes instead, which I'm trying to do in statsmodels. I'm not a statistician and I ended up with scipy.stats a bit by accident, so if someone can come up with a good long term plan for it and implements it, I would be glad. My main objective is that we can trust any results that we get out of scipy.stats. (testing and verification) Josef
Bruce
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
participants (4)
-
Bruce Southey
-
josef.pktd@gmail.com
-
Robert Kern
-
Warren Weckesser