Hi All, I thought I'd raise this topic just to get some ideas out there. At the moment I see two areas that I'd like to see addressed. 1. Documentation editor. This would involve looking at the generated documentation and it's organization/coverage as well such things as style and maybe reviewing stuff on the documentation site. This would be more technical writing than coding. 2. Test coverage. There are a lot of areas of numpy that are not well tested as well as some tests that are still doc tests and should probably be updated. This is a substantial amount of work and would require some familiarity with numpy as well as a willingness to ping developers for clarification of some topics. Thoughts? Chuck
On Thu, Dec 29, 2011 at 9:50 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
Hi All,
I thought I'd raise this topic just to get some ideas out there. At the moment I see two areas that I'd like to see addressed.
1. Documentation editor. This would involve looking at the generated documentation and it's organization/coverage as well such things as style and maybe reviewing stuff on the documentation site. This would be more technical writing than coding. 2. Test coverage. There are a lot of areas of numpy that are not well tested as well as some tests that are still doc tests and should probably be updated. This is a substantial amount of work and would require some familiarity with numpy as well as a willingness to ping developers for clarification of some topics.
Thoughts?
First thought: very useful, but probably not GSOC topics by themselves. For a very good student, I'd think topics like implementing NA bit masks or improved user-defined dtypes would be interesting. In SciPy there's also a lot to do, and that's probably a better project for students who prefer to work in Python. Thanks for bringing this up. Last year we missed the boat, it would be good to get one or more slots this year. Ralf
On Thu, Dec 29, 2011 at 4:36 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
On Thu, Dec 29, 2011 at 9:50 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Hi All,
I thought I'd raise this topic just to get some ideas out there. At the moment I see two areas that I'd like to see addressed.
Documentation editor. This would involve looking at the generated documentation and it's organization/coverage as well such things as style and maybe reviewing stuff on the documentation site. This would be more technical writing than coding. Test coverage. There are a lot of areas of numpy that are not well tested as well as some tests that are still doc tests and should probably be updated. This is a substantial amount of work and would require some familiarity with numpy as well as a willingness to ping developers for clarification of some topics.
Thoughts?
First thought: very useful, but probably not GSOC topics by themselves.
For a very good student, I'd think topics like implementing NA bit masks or improved user-defined dtypes would be interesting. In SciPy there's also a lot to do, and that's probably a better project for students who prefer to work in Python.
Thanks for bringing this up. Last year we missed the boat, it would be good to get one or more slots this year.
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Along with test coverage, have any of you considered any systematic monitoring of NumPy performance? With all of the extensive refactoring / work on the internals, it would be useful to keep an eye on things in case of any performance regressions. I mention this because I started a little prototype project (http://github.com/wesm/vbench) for doing exactly that for my own development purposes-- it's already proved extremely useful. Anyway, just a thought. I'm sure a motivated student could spend a whole summer writing unit tests for NumPy and nothing else. - Wes
Hi!
Along with test coverage, have any of you considered any systematic monitoring of NumPy performance?
I'm mildly obsessed with performance and benchmarking of NumPy. I used to use a lot of MATLAB until a year back and I tend to compare Python performance with it all the time. I generally don't feel happy until I'm convinced that I've extracted the last bit of speed out of my Python code. I think the generalization of this idea is more or less equivalent to performance benchmarking. Of course, I know there's a lot more than 'MATLAB vs Python' to it. I'd be more than happy to be involved. GSoC or otherwise. Where do I start? Thanks
On 12/29/11 10:37 PM, Jaidev Deshpande wrote:
Hi!
Along with test coverage, have any of you considered any systematic monitoring of NumPy performance?
I'm mildly obsessed with performance and benchmarking of NumPy. I used to use a lot of MATLAB until a year back and I tend to compare Python performance with it all the time. I generally don't feel happy until I'm convinced that I've extracted the last bit of speed out of my Python code.
I think the generalization of this idea is more or less equivalent to performance benchmarking. Of course, I know there's a lot more than 'MATLAB vs Python' to it. I'd be more than happy to be involved. GSoC or otherwise.
Where do I start?
We've recently had a discussion about more intelligent timeit commands and timing objects in Python/Sage. People here might find the discussion interesting, and it might also be interesting to collaborate on code. The basic idea was a much smarter timeit command that uses more intelligent statistics and presents a much more comprehensive look at the timing information. Here is the discussion: https://groups.google.com/forum/#!topic/sage-devel/8lq3twm9Olc Here is our ticket tracking the issue: http://trac.sagemath.org/sage_trac/ticket/12168 Here are some examples of the analysis: http://sagenb.org/home/pub/3857/ I've CCd the sage-devel list as well, which is where our discussion happened. Thanks, Jason
On Fri, Dec 30, 2011 at 5:45 AM, <jason-sage@creativetrax.com> wrote:
On 12/29/11 10:37 PM, Jaidev Deshpande wrote:
Hi!
Along with test coverage, have any of you considered any systematic monitoring of NumPy performance?
I'm mildly obsessed with performance and benchmarking of NumPy. I used to use a lot of MATLAB until a year back and I tend to compare Python performance with it all the time. I generally don't feel happy until I'm convinced that I've extracted the last bit of speed out of my Python code.
I think the generalization of this idea is more or less equivalent to performance benchmarking. Of course, I know there's a lot more than 'MATLAB vs Python' to it. I'd be more than happy to be involved. GSoC or otherwise.
Where do I start?
We've recently had a discussion about more intelligent timeit commands and timing objects in Python/Sage. People here might find the discussion interesting, and it might also be interesting to collaborate on code. The basic idea was a much smarter timeit command that uses more intelligent statistics and presents a much more comprehensive look at the timing information.
Here is the discussion: https://groups.google.com/forum/#!topic/sage-devel/8lq3twm9Olc
Here is our ticket tracking the issue: http://trac.sagemath.org/sage_trac/ticket/12168
Here are some examples of the analysis: http://sagenb.org/home/pub/3857/
Nice. It would be cool to have this available as a separate ipython magic command. For performance monitoring it's probably unnecessary, regular %timeit should be OK for that.
Performance monitoring does require quite a bit of infrastructure (like Wes' vbench project) though, which could be a good (GSOC) project. There's other VCS's to support, maybe a buildbot plugin, many options there. Ralf
On Thu, Dec 29, 2011 at 1:36 PM, Ralf Gommers <ralf.gommers@googlemail.com> wrote:
First thought: very useful, but probably not GSOC topics by themselves.
Documentation is specificsly excluded from GSoC (at least it was a couple years ago when I last was involved) Not sure about testing, but I'd guess it can't be a project by itself. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi Chris
Documentation is specificsly excluded from GSoC (at least it was a couple years ago when I last was involved)
Documentation wasn't excluded last year from GSoC, there were quite a few projects that required a lot of documentation. But yes, there was no "documentation only" project. Anyhow, it seems reasonable that testing alone can't be a project. What about benchmarking and the related statistics? Does that qualify as a worthwhile project (again, GSoC or otherwise)? Thanks
Hi Chris
Documentation is specificsly excluded from GSoC (at least it was a couple years ago when I last was involved)
Documentation wasn't excluded last year from GSoC, there were quite a few projects that required a lot of documentation. But yes, there was no "documentation only" project.
Anyhow, it seems reasonable that testing alone can't be a project. What about benchmarking and the related statistics? Does that qualify as a worthwhile project (again, GSoC or otherwise)?
That's certainly worth doing, and doing well. You could start with investigating what Wes has done with vbench so far, and look at how to get
On Sat, Dec 31, 2011 at 6:43 AM, Jaidev Deshpande < deshpande.jaidev@gmail.com> wrote: the output of that into http://speed.pypy.org/. I have the feeling it's not enough work for a GSoC project though, and with a project like starting scikits.signal you'd have a better chance. Ralf
On Fri, Dec 30, 2011 at 9:43 PM, Jaidev Deshpande <deshpande.jaidev@gmail.com> wrote:
Documentation is specificsly excluded from GSoC (at least it was a couple years ago when I last was involved)
Documentation wasn't excluded last year from GSoC, there were quite a few projects that required a lot of documentation.
sure -- it's certanly encouraged to docuemnt code that gets written, but...
But yes, there was no "documentation only" project.
exactly -- from the 2011 GSoC FAQ: 12. Are proposals for documentation work eligible for Google Summer of Code? While we greatly appreciate the value of documentation, this program is an exercise in developing code; we can't accept proposals for documentation-only work at this time.
Anyhow, it seems reasonable that testing alone can't be a project. What about benchmarking and the related statistics? Does that qualify as a worthwhile project (again, GSoC or otherwise)?
I didn't find a specific RAQ, but from the above, I suspect that all projects must be primarily about producing code: not documenting, testing, or benchmarking. Those, of course, should all be part of code development, but not the focus. - Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Dec 29, 2011 at 2:36 PM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:
On Thu, Dec 29, 2011 at 9:50 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Hi All,
I thought I'd raise this topic just to get some ideas out there. At the moment I see two areas that I'd like to see addressed.
1. Documentation editor. This would involve looking at the generated documentation and it's organization/coverage as well such things as style and maybe reviewing stuff on the documentation site. This would be more technical writing than coding. 2. Test coverage. There are a lot of areas of numpy that are not well tested as well as some tests that are still doc tests and should probably be updated. This is a substantial amount of work and would require some familiarity with numpy as well as a willingness to ping developers for clarification of some topics.
Thoughts?
First thought: very useful, but probably not GSOC topics by themselves.
For a very good student, I'd think topics like implementing NA bit masks or improved user-defined dtypes would be interesting. In SciPy there's also a lot to do, and that's probably a better project for students who prefer to work in Python.
Good points. There is actually a fair bit of work that could go into NA. The low level infrastructure seems to me somewhat independent of the arguments about the API. I see four areas there 1) Size - that requires bit masks and a decision that masks only take two values. 2) Speed - that requires support in the ufunc loops. 3) Functions - isna needs some help, like isanyna(a, axis=1) 4) More support in current functions. Chuck
On Thu, Dec 29, 2011 at 2:36 PM, Ralf Gommers <ralf.gommers@googlemail.com>wrote:
On Thu, Dec 29, 2011 at 9:50 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Hi All,
I thought I'd raise this topic just to get some ideas out there. At the moment I see two areas that I'd like to see addressed.
1. Documentation editor. This would involve looking at the generated documentation and it's organization/coverage as well such things as style and maybe reviewing stuff on the documentation site. This would be more technical writing than coding. 2. Test coverage. There are a lot of areas of numpy that are not well tested as well as some tests that are still doc tests and should probably be updated. This is a substantial amount of work and would require some familiarity with numpy as well as a willingness to ping developers for clarification of some topics.
Thoughts?
First thought: very useful, but probably not GSOC topics by themselves.
For a very good student, I'd think topics like implementing NA bit masks or improved user-defined dtypes would be interesting. In SciPy there's also a lot to do, and that's probably a better project for students who prefer to work in Python.
Besides NA bit masks, the new iterator isn't used in a lot of places it could be. Maybe replacing all uses of the old iterator? I'll admit, that smacks more of maintenance than developing new code and might be a hard sell. Chuck
On Sun, Jan 15, 2012 at 3:02 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Thu, Dec 29, 2011 at 2:36 PM, Ralf Gommers <ralf.gommers@googlemail.com
wrote:
On Thu, Dec 29, 2011 at 9:50 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Hi All,
I thought I'd raise this topic just to get some ideas out there. At the moment I see two areas that I'd like to see addressed.
1. Documentation editor. This would involve looking at the generated documentation and it's organization/coverage as well such things as style and maybe reviewing stuff on the documentation site. This would be more technical writing than coding. 2. Test coverage. There are a lot of areas of numpy that are not well tested as well as some tests that are still doc tests and should probably be updated. This is a substantial amount of work and would require some familiarity with numpy as well as a willingness to ping developers for clarification of some topics.
Thoughts?
First thought: very useful, but probably not GSOC topics by themselves.
For a very good student, I'd think topics like implementing NA bit masks or improved user-defined dtypes would be interesting. In SciPy there's also a lot to do, and that's probably a better project for students who prefer to work in Python.
Besides NA bit masks, the new iterator isn't used in a lot of places it could be. Maybe replacing all uses of the old iterator? I'll admit, that smacks more of maintenance than developing new code and might be a hard sell.
That does smell like maintenance.
I've cleaned up http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas and added the idea on missing data. It would be useful to describe new ideas there well, not as a single bullet. Maybe also add potential mentors? And once we have some more ideas, link to it from scipy.org? Is anyone keeping an eye on the relevant channels to get involved this year? Previously Jarrod was doing that. Ralf
participants (6)
-
Charles R Harris
-
Chris Barker
-
Jaidev Deshpande
-
jason-sage@creativetrax.com
-
Ralf Gommers
-
Wes McKinney