Looking for the most important bugs, documentation needs, etc.

Hi all, Some colleagues and I are interested in contributing to numpy. We have a range of backgrounds -- I for example am new to contributing to open source software but have a (small) bit of background in scientific computation, while others have extensive experience contributing to open source projects. We've looked at the issue tracker and submitted a couple patches today but we would be interested to hear what active contributors to the project consider the most pressing, important, and/or interesting needs at the moment. I personally am quite interested in hearing about the most pressing documentation needs (including example code). Thanks very much, six

Hi, On Tue, Jul 10, 2012 at 4:20 AM, Six Silberman <silberman.six@gmail.com>wrote:
Hi all,
Some colleagues and I are interested in contributing to numpy.
That's great, welcome!
We have a range of backgrounds -- I for example am new to contributing to open source software but have a (small) bit of background in scientific computation, while others have extensive experience contributing to open source projects. We've looked at the issue tracker and submitted a couple patches today but we would be interested to hear what active contributors to the project consider the most pressing, important, and/or interesting needs at the moment. I personally am quite interested in hearing about the most pressing documentation needs (including example code).
For documentation we have docstrings for each function and tutorial-style docs (http://docs.scipy.org/doc/numpy/user/, http://scipy-lectures.github.com/intro/numpy/index.html) . All docstrings should have clear usage examples, but I'm actually finding it quite hard to find functions that don't have any right now. The only one I could dig up so quickly is corrcoef(). There must be a few more. There are two ways to contribute to the docs, either send a pull request on Github if you familiar with git (or want to learn it), or use our doc wiki: http://docs.scipy.org/numpy/docs/numpy.lib.function_base.corrcoef In the doc wiki you can immediate see if the rendered version looks OK. You have to register a username and then ask on this list for edit rights if you want to use the wiki. Besides those few docstrings that miss examples it's mainly the user guide that needs some work I think. For example the "performance" section is still empty. Filling that in will require some in-depth numpy/python knowledge though. If you would like like to work on improving the documentation with examples, my suggestion would be to actually work on a part of scipy that interests you. We aim to get the scipy docstrings to the same level of quality as the numpy ones, and there's a lot to do there. Most docstrings miss examples, and some even miss more basic stuff (parameter/return value descriptions, formatting issues). This is a good overview of important docstrings per topic: docs.scipy.org/scipy/Milestones/ Cheers, Ralf

For documentation we have docstrings for each function and tutorial-style docs (http://docs.scipy.org/doc/numpy/user/, http://scipy-lectures.github.com/intro/numpy/index.html) . All docstrings should have clear usage examples, but I'm actually finding it quite hard to find functions that don't have any right now. The only one I could dig up so quickly is corrcoef(). There must be a few more.
The documentation wiki has a little known feature to list functions that do not have docstrings and docstrings that do not have examples. Go to http://docs.scipy.org/numpy/search/ and click on the 'No Examples' or 'No Documentation' links. Same searches are available with scipy at http://docs.scipy.org/scipy/search/, which Ralf already pointed out needs the most work. Kindest regards, Tim

On Tue, Jul 10, 2012 at 4:20 AM, Six Silberman <silberman.six@gmail.com>wrote:
Hi all,
Some colleagues and I are interested in contributing to numpy. We have a range of backgrounds -- I for example am new to contributing to open source software but have a (small) bit of background in scientific computation, while others have extensive experience contributing to open source projects. We've looked at the issue tracker and submitted a couple patches today but we would be interested to hear what active contributors to the project consider the most pressing, important, and/or interesting needs at the moment. I personally am quite interested in hearing about the most pressing documentation needs (including example code).
As for important issues, I think many of them are related to the core of numpy. But there's some more isolated ones, which is probably better to get started. Here are some that are high on my list of things to fix/improve: - Numpy doesn't work well (or at all) on OS X 10.7 when built with llvm-gcc, which is the default compiler on that platform. With Clang it seems to work fine. Same for Scipy. http://projects.scipy.org/numpy/ticket/1951 - We don't have binary installers for Python 3.x on OS X yet. This requires adapting the installer build scripts that work for 2.x. See pavement.py in the base dir of the repo. - Something that's more straightforward: improving test coverage. It's lacking in a number of places; one of the things that comes to mind is that all functions should be tested for correct behavior with empty input. Normally the expected behavior is empty in --> empty out. When that's not tested, we get things like http://projects.scipy.org/numpy/ticket/2078. Ticket for "empty" test coverage: http://projects.scipy.org/numpy/ticket/2007 - There's a large amount of "normal" bugs, working on any of those would be very helpful too. Hard to say here which ones out of the several hundred are important. It is safe to say though I think that the ones requiring touching the C code are more in need of attention than the pure Python ones. I see a patch for f2py already, and a second ticket opened. This is of course useful, but not too many devs are working on it. Unless Pearu has time to respond this week, it may be hard to get feedback on that topic quickly. Cheers, Ralf

On Tue, Jul 10, 2012 at 11:36 AM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:
On Tue, Jul 10, 2012 at 4:20 AM, Six Silberman <silberman.six@gmail.com>wrote:
Hi all,
Some colleagues and I are interested in contributing to numpy. We have a range of backgrounds -- I for example am new to contributing to open source software but have a (small) bit of background in scientific computation, while others have extensive experience contributing to open source projects. We've looked at the issue tracker and submitted a couple patches today but we would be interested to hear what active contributors to the project consider the most pressing, important, and/or interesting needs at the moment. I personally am quite interested in hearing about the most pressing documentation needs (including example code).
As for important issues, I think many of them are related to the core of numpy. But there's some more isolated ones, which is probably better to get started. Here are some that are high on my list of things to fix/improve:
- Numpy doesn't work well (or at all) on OS X 10.7 when built with llvm-gcc, which is the default compiler on that platform. With Clang it seems to work fine. Same for Scipy. http://projects.scipy.org/numpy/ticket/1951
- We don't have binary installers for Python 3.x on OS X yet. This requires adapting the installer build scripts that work for 2.x. See pavement.py in the base dir of the repo.
- Something that's more straightforward: improving test coverage. It's lacking in a number of places; one of the things that comes to mind is that all functions should be tested for correct behavior with empty input. Normally the expected behavior is empty in --> empty out. When that's not tested, we get things like http://projects.scipy.org/numpy/ticket/2078. Ticket for "empty" test coverage: http://projects.scipy.org/numpy/ticket/2007
- There's a large amount of "normal" bugs, working on any of those would be very helpful too. Hard to say here which ones out of the several hundred are important. It is safe to say though I think that the ones requiring touching the C code are more in need of attention than the pure Python ones.
I see a patch for f2py already, and a second ticket opened. This is of course useful, but not too many devs are working on it. Unless Pearu has time to respond this week, it may be hard to get feedback on that topic quickly.
Here are some relatively straightforward issues which only require touching Python code: http://projects.scipy.org/numpy/ticket/808 http://projects.scipy.org/numpy/ticket/1968 http://projects.scipy.org/numpy/ticket/1976 http://projects.scipy.org/numpy/ticket/1989 And a Cython one (numpy.random): http://projects.scipy.org/numpy/ticket/1492 I ran into one more patch that I assume one of you just attached: http://projects.scipy.org/numpy/ticket/2074. It's important to understand a little of how our infrastructure works. We changed to git + github last year; submitting patches as pull requests on Github has the lowest overhead for us, and we get notifications. For patches on Trac, we have to manually download and apply them. Plus we don't get notifications, which is quite unhelpful unfortunately. Therefore I suggest using git, and if you can't or you feel that the overhead / learning curve is too large, please ping this mailing list about patches you submit on Trac. Cheers, Ralf

On Tue, Jul 10, 2012 at 6:07 AM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:
On Tue, Jul 10, 2012 at 11:36 AM, Ralf Gommers < ralf.gommers@googlemail.com> wrote:
On Tue, Jul 10, 2012 at 4:20 AM, Six Silberman <silberman.six@gmail.com>wrote:
Hi all,
Some colleagues and I are interested in contributing to numpy. We have a range of backgrounds -- I for example am new to contributing to open source software but have a (small) bit of background in scientific computation, while others have extensive experience contributing to open source projects. We've looked at the issue tracker and submitted a couple patches today but we would be interested to hear what active contributors to the project consider the most pressing, important, and/or interesting needs at the moment. I personally am quite interested in hearing about the most pressing documentation needs (including example code).
As for important issues, I think many of them are related to the core of numpy. But there's some more isolated ones, which is probably better to get started. Here are some that are high on my list of things to fix/improve:
- Numpy doesn't work well (or at all) on OS X 10.7 when built with llvm-gcc, which is the default compiler on that platform. With Clang it seems to work fine. Same for Scipy. http://projects.scipy.org/numpy/ticket/1951
- We don't have binary installers for Python 3.x on OS X yet. This requires adapting the installer build scripts that work for 2.x. See pavement.py in the base dir of the repo.
- Something that's more straightforward: improving test coverage. It's lacking in a number of places; one of the things that comes to mind is that all functions should be tested for correct behavior with empty input. Normally the expected behavior is empty in --> empty out. When that's not tested, we get things like http://projects.scipy.org/numpy/ticket/2078. Ticket for "empty" test coverage: http://projects.scipy.org/numpy/ticket/2007
- There's a large amount of "normal" bugs, working on any of those would be very helpful too. Hard to say here which ones out of the several hundred are important. It is safe to say though I think that the ones requiring touching the C code are more in need of attention than the pure Python ones.
I see a patch for f2py already, and a second ticket opened. This is of course useful, but not too many devs are working on it. Unless Pearu has time to respond this week, it may be hard to get feedback on that topic quickly.
Here are some relatively straightforward issues which only require touching Python code:
http://projects.scipy.org/numpy/ticket/808 http://projects.scipy.org/numpy/ticket/1968 http://projects.scipy.org/numpy/ticket/1976 http://projects.scipy.org/numpy/ticket/1989
And a Cython one (numpy.random): http://projects.scipy.org/numpy/ticket/1492
I ran into one more patch that I assume one of you just attached: http://projects.scipy.org/numpy/ticket/2074. It's important to understand a little of how our infrastructure works. We changed to git + github last year; submitting patches as pull requests on Github has the lowest overhead for us, and we get notifications. For patches on Trac, we have to manually download and apply them. Plus we don't get notifications, which is quite unhelpful unfortunately. Therefore I suggest using git, and if you can't or you feel that the overhead / learning curve is too large, please ping this mailing list about patches you submit on Trac.
Cheers, Ralf
By the way, for those who are looking to learn how to use git and github: https://github.com/blog/1183-try-git-in-your-browser Cheers! Ben Root
participants (4)
-
Benjamin Root
-
Cera, Tim
-
Ralf Gommers
-
Six Silberman