[SciPy-Dev] Speaking of tickets...

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Apr 1 11:54:06 EDT 2010


On Thu, Apr 1, 2010 at 9:51 AM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 03/31/2010 12:57 PM, Robert Kern wrote:
>> On Wed, Mar 31, 2010 at 11:32,<josef.pktd at gmail.com>  wrote:
>>
>>> On Wed, Mar 31, 2010 at 1:15 AM, Warren Weckesser
>>> <warren.weckesser at enthought.com>  wrote:
>>>
>>>> Just curious: what are the plans for the "Statistic Review" tickets in
>>>> scipy.stats, from April, 2006?  Can any of these be closed?
>>>>
>>> short answer:
>>> I looked at them and there are several that can be closed, especially
>>> the functions that have been removed or depreciated. For others, I
>>> have to check my notes to see which ones I verified and added tested.
>>>
>>>
>>> long answer (this were my initial notes when I browsed the tickets):
>>>
>>> I found them very inconvenient to work with. Many or most of them are
>>> empty and I didn't find an easy way to get an overview which ones of
>>> the tickets contain useful information and require attention.
>>>
>> Well, the point of them was to systematically review every function.
>> Those "empty" tickets need just as much attention as those that have
>> comments.
>>
>> However, it's obvious that I dropped the ball on organizing that
>> effort. They may be systematically closed now.
>>
>>
> Really these functions do need to be addressed but it is rather daunting
> task to go through all these functions especially for the assigned
> criteria and given the 'duplicated' masked array functions. There are
> about 187 functions involved in stats modules - although some are
> deprecated and 'redefinition' of existing functions. Some of the
> functions are more utilities than stats functions and some functions are
> very dimension specific. For example, there is a tmean(a,
> limits=(min,max)) function that is essentially doing "a.compress((a>min)
> & (a<max)).mean()" because there is no axis option.
> masked array function, trimmed_mean, that does have an axis argument.
>
> However, I came to the conclusion that most of these have more problems
> than they are worth making my original idea worthless. So it would be
> better to have a clear plan for a proper set of statistical functions
> than just 'blindly fixing' the existing functions.

Because there are so many functions, because test coverage is still
not sufficient and because of backwards compatibility, I still prefer
an incremental approach.

The trim functions are mostly convenience functions according to Gary,
and since I haven't found yet a use for them, I mostly ignored them,
except for adding some tests.

The statistical tests are (almost?) completely verified and produce
the advertised results.
But in morestats.py there are still groups of functions, e.g. boxcox,
kstat, ppcc that look ok but I'm not sure about them, and functions
that I would like to (temporarily) delete or replace ('pdf_moments',
'pdf_fromgamma', 'pdfapprox') because the are incorrect.

Overall, I think of scipy.stats a bit of a laundry list and it's still
difficult to see what "a clear plan for a proper set of statistical
functions than just 'blindly fixing' the existing functions" would be.
My current opinion is that a lot of the more elaborate statistical
functions should be classes instead, which I'm trying to do in
statsmodels.

I'm not a statistician and I ended up with scipy.stats a bit by
accident, so if someone can come up with a good long term plan for it
and implements it, I would be glad.
My main objective is that we can trust any results that we get out of
scipy.stats. (testing and verification)

Josef






>
>
> Bruce
>
>
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>



More information about the SciPy-Dev mailing list