[SciPy-Dev] scipy.stats improvements

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Mar 15 14:02:09 EDT 2015


On Sun, Mar 15, 2015 at 1:33 PM, <josef.pktd at gmail.com> wrote:

>
>
> On Sun, Mar 15, 2015 at 12:38 AM, Abraham Escalante <aeklant at gmail.com>
> wrote:
>
>> Hi Ralf, thanks for all the feedback.
>>
>> I have made some changes. You can find the second draft here:
>> http://1drv.ms/1BFW6Pb
>>
>
>
>>
>> I reckon that when it comes to the StatisticsCleanup issues, the schedule
>> may change considering their varying scopes. However, I need to get the
>> ball rolling with the community feedback since most of the issues don't
>> have any. I also need to do my own work getting to know the functions more
>> closely, which is the next step in my plan. Do you have any other
>> suggestions?
>>
>> I provide an overview of the changes to the draft here for your
>> convenience:
>>
>>
>>> About the abstract and deliverables: I would state the overall goal as
>>> "enhancement and addressing maintenance issues"
>>>
>>
>> It did sound like more of a documentation project than a coding effort. I
>> made a few changes and I hope it sounds more accurate now.
>>
>
> I would explicitly add adding and checking unit tests.
> I think some functions with insufficient test coverage should be verified
> if possible against R or similar.
>
>
>
>>
>>
>>
>>> - the change to _chk_asarray gets too much attention I think, it's not
>>> that big a deal (and effort will also be minor on the overall scale of
>>> things).
>>>
>>
>> I have removed some of the focus to it. It is also listed in the
>> "community bonding" period because its purpose is to help me with the
>> learning curve.
>>
>>
>>
>>> - you reserve separate time for PEP8 compliance, this should actually be
>>> done at the moment you write any code. The TravisCI tests for Scipy will
>>> check PEP8 automatically, so you can't even do it separately.
>>>
>>
>> I've kept it as a deliverable because it is obviously required, but I
>> removed it from the housekeeping buffer weeks.
>>
>>
>>
>>> - API changes for trimmed statistics functions will take longer than
>>> other issues in StatisticsReview.
>>>
>>
>> I moved the task to week 5. I also added a task at the "community bonding
>> period" (although in reality this should start earlier and go along my
>> learning curve) to make sure all the issues are defined in scope before the
>> coding begins.
>>
>>
>> - ppcc_plot is already done in PR 4563, so doesn't need to be in your plan
>>>
>>
>> Removed it and made a note at the deliverables section.
>>
>>
>>
>>> - making stats.mstats consistent with stats is also a larger job. I
>>> would put it towards the end of your plan.
>>>
>>
>> I moved this to the very end while keeping the last week as a buffer just
>> in case this or any other tasks need some more work.
>>
>>
>> The other thing I recommend is to look at each function in your proposal,
>>> and assess whether it just needs a few tweaks or a lot of work.
>>>
>>
>> Agreed. This is basically what the scope definition task is meant to do
>> and although it is listed to start at "community bonding" I plan to start
>> right away.
>>
>
>
> Review and work for several functions that are on the list will not take
> much time.
>
> "Implement `alternative keyword` addition to all hypothesis tests"
> This might be time consuming or difficult for the hypothesis tests that
> are not based on normal or t distributions, e.g. KS tests
> or essentially impossible without writing new algorithms: e.g.
> fisher_exact, IIRC.
> for normal and t-based tests it is trivial, once the pattern is
> established, plus decision on breaking backwards compatibility (?!)
>
> Another general issue that I would like to see, if there is time, is to
> add a `missing` keyword to the functions, that could in the first stage
> just delegate to the masked array functions.
>


A good way to get started with the review of the stats functions especially
the hypothesis tests, is to look at the corresponding R functions. Besides
for verifying the results, R also has usually more references, and it is
useful to check whether R has additional options and whether those should
and can be implemented in scipy.
(without looking at and copying the license incompatible R source).

Josef



>
> Josef
>
>
>>
>>
>> Cheers,
>> Abraham.
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20150315/abd8ef77/attachment.html>


More information about the SciPy-Dev mailing list