[SciPy-dev] GSoC Project Proposal: Datasource and Jonathan Taylor's statistical models
Skipper Seabold
jsseabold at gmail.com
Fri Mar 27 13:43:54 EDT 2009
Hello all,
I am a first year PhD student in Economics at American University, and
I would very much like to participate in the GSoC with the NumPy/SciPy
community. I am looking for some feedback and discussion before I
submit a proposal.
Judging by the ideas page and the discussion in this thread (
http://mail.scipy.org/pipermail/scipy-dev/2009-February/011373.html )
I think the following project proposal would be useful to the
community.
My proposal would have two parts, the first would be to improve
datasource and integrate it into the numpy/scipy io. I see this as a
way to get my feet wet working on a project. I do not imagine that it
would take more than 2-3 weeks work on my end.
The second part would be to get Jonathan Taylor's statistical models
from the NiPy project into scipy.stats. I think that I would be a
good candidate for this work, as I am currently studying statistics
and learning the ins and outs of NumPy/SciPy, so I don't mind doing
some of the less appealing work as this is also a great learning
opportunity. Also I see this as a great way to get involved in the
SciPy community in an area that currently needs some attention. I am
a student, so I would be able to help maintain the code, bug fix, and
address other areas of the statistical capabilities that need
attention.
Below is a general outline of my proposal with some areas that I have
identified as needing work. I am eager to discuss some aspects of the
projects with those that are interested and to work on the appropriate
milestones.
1) Improve datasource and integrate it into all the numpy/scipy io
Bug Fixes
Catch and handle malformed URLs
Refactoring
Enhancements
Improve findfile method
Improve cache method
Add zip archive, tar file handling capabilities
Improve networking interface to handle timeouts and proxies if
there is sufficient interest
Documentation
Document changes
Tests
Implement test coverage for new changes
Copy/Move to scipy.io
2) Integrate Jonathan Taylor's statistical models into scipy.stats
These models are currently in the NiPy project
Merge relevant branches (branch trunk-josef models has the most recent
changes, I believe)
I will focus mostly on bringing over the linear models, which I
believe would include at the least:
bspline.py, contrast.py, gam.py, glm.py, model.py, regression.py, utils.py
Bug Fixes
Bug hunting
Improve existing test coverage
Refactoring
Eliminate existing and created duplicate functionality
Make sure parameters are consistent, etc.
Enhancements
Documentation
Document changes
Make any necessary changes to stats/info.py
Testing
Make sure test coverage is adequate
More information about the SciPy-Dev
mailing list