[SciPy-Dev] scipy.stats improvements

Ralf Gommers ralf.gommers at gmail.com
Sat Mar 14 08:22:18 EDT 2015


On Thu, Mar 12, 2015 at 4:47 AM, Abraham Escalante <aeklant at gmail.com>
wrote:

> Hello,
>
> I just realised not everyone may be able to see the attached GSoC proposal
> in my previous message. I apologise and here it is in a more friendly way:
>
> https://onedrive.live.com/redir?resid=E5548AD35687C4B2!490&authkey=!APnx5au6jT6DXkM&ithint=file%2cpdf
>

Hi Abraham, that's a pretty good start.

About the abstract and deliverables: I would state the overall goal as
"enhancement and addressing maintenance issues" - this gives a different
focus from leading with docs/tests. Maintenance issues then include fixing
bugs, adding tests and documentation. Allowing multiple hypotheses in all
hypothesis tests is a significant enhancement.

Now some detailed comments:
- the change to _chk_asarray gets too much attention I think, it's not that
big a deal (and effort will also be minor on the overall scale of things).
- you reserve separate time for PEP8 compliance, this should actually be
done at the moment you write any code. The TravisCI tests for Scipy will
check PEP8 automatically, so you can't even do it separately.
- API changes for trimmed statistics functions will take longer than other
issues in StatisticsReview. It will also require more discussion (API
changes always do, especially backwards-incompatible ones), so I suggest to
move it to a later date in your plan. Maybe week 5 or 6, that leaves enough
time to iterate.
- ppcc_plot is already done in PR 4563, so doesn't need to be in your plan
- making stats.mstats consistent with stats is also a larger job. I would
put it towards the end of your plan. Prio 1 is to complete all functions in
stats, and moving it to the end also prevents the situation where you
change an mstats function first and then realize that the stats equivalent
actually needs changes.

The other thing I recommend is to look at each function in your proposal,
and assess whether it just needs a few tweaks or a lot of work. Example:
fligner needs very little work (maybe an example and one or two more
tests), while ppcc_max needs full docstring+tests and you may find it
doesn't work 100% correctly. You could keep the overview you get this way
separate from your proposal - it will help you change the timeline in your
proposal to something more realistic.

Cheers,
Ralf




>
> Thanks again,
> Abraham.
>
>
>
> 2015-03-06 10:52 GMT-06:00 Ralf Gommers <ralf.gommers at gmail.com>:
>
>> Hi Abraham,
>>
>>
>>
>> On Wed, Mar 4, 2015 at 8:08 PM, Abraham Escalante <aeklant at gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> My name is Abraham Escalante. I would like to make a proposal for the
>>> "scipy.stats improvements" project for the Google Summer of Code. I am new
>>> to the Open Source community (although I do have experience with git and
>>> github) and this seems to me like a perfect place to start contributing.
>>>
>>
>> Welcome!
>>
>>
>>> I forked the scipy/scipy project and I've been perusing some of the
>>> StatisticsCleanup issues since I would like to make my first contribution
>>> before I actually make my formal proposal (and I know it would be a great
>>> way for me to become acquainted with the code, guidelines, tests and the
>>> like).
>>>
>>
>> That's definitely a good idea (and actually it's required).
>>
>>
>>> I have a few questions that I would like to trouble you with:
>>>
>>> 1) Most of the StatisticsCleanup open issues mention a "need for review"
>>> and also "StatisticsReview guidelines". *Could you refer me to the
>>> StatisticsReview guidelines?* (I have been looking but I have not been
>>> able to find it in the forked project nor the scipy documentation). *What
>>> does it mean to have an issue flagged as "review"?*
>>> see https://github.com/scipy/scipy/issues/693 for an example of what I
>>> mean.
>>>
>>
>> Ah, this was a pre-Github wiki page that has disappeared after Trac was
>> disabled. I can't find the original anymore; I'll rewrite those guidelines
>> on the Github scipy wiki. Basically it comes down to checking (and
>> fixing/implementing if needed) the following:
>> - is the implementation correct?
>>   - needs checking against another implementation (R/Matlab) and/or a
>> reliable reference
>>   - this includes handling of small or empty arrays, and array_like
>> (list, tuple) inputs
>> - is the docstring complete?
>>   - at a minimum should include a good summary line, parameters, returns
>> section and needed details to understand the algorithm
>>   - preferably also References and Examples sections
>> - is the test coverage OK?
>>
>>
>> For some functions that have StatisticsReview issues it's a matter of
>> checking and making a few tweaks, for others it may be a complete rewrite
>> (see https://github.com/scipy/scipy/pull/4563 for a recent example).
>>
>>
>>> 2) I am currently going through the code (using the StatisticsCleanup
>>> issues as a guide) and starting to read the SciPy statistics tutorial. *Do
>>> you have any suggested reading* to get more familiarised with SciPy
>>> (the statistics part in particular), Numpy or to brush up on my statistics
>>> knowledge? (pretty much anything to get me up the learning curve would be
>>> useful).
>>>
>>
>> The tutorial you started on is good, for a broad intro to numpy/scipy
>> this is also a quite good tutorial: http://scipy-lectures.github.io/.
>> Regarding books on statistics, there's an almost infinite choice, I'm not
>> going to try to make  recommendation. Maybe the real statisticians on this
>> list will give you their favorites:)
>>
>> When starting to work on scipy, reading the developer guidelines at
>> http://docs.scipy.org/doc/numpy-dev/dev/ is also a good idea.
>>
>> Cheers,
>> Ralf
>>
>>
>>
>>> Thanks in advance,
>>> Abraham Escalante.
>>>
>>>
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>>
>>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20150314/3beb8e6d/attachment.html>


More information about the SciPy-Dev mailing list