Scipy workflow (and not tools).
Hi, I've split this off into a new thread because I felt there were two issues in Stefan's original thread. This is in the hope that we can stimulate discussion on the workflow (as opposed to - say - which version control system to use, or which bugtracker). I would be very interested to see if we can come to a consensus on the important discussion of whether to introduce fairly formal code review into the scipy workflow. I've appended the key piece of discussion below.
2009/2/24 Travis E. Oliphant <oliphant@enthought.com>:
I think the biggest problem has been time and adding too formal of a process will just increase the time it takes to get code into SciPy. I'm fine with emphasizing documentation and tests as we discuss things and we should encourage each other, but I'm not comfortable with hard-line statements like the ones being made above. Yes, such things are helpful, but they are also expensive and I worry more about what we lose in contributions.
Having so little time means that we cannot be cavalier about adding broken code to SciPy. Like Matthew mentioned, this becomes an immense maintenance burden.
The quality of what we create should emerge as all interested parties critically look at the code that is available in SciPy.
I agree with that sentiment; and looking critically at code in SciPy starts with our own patches.
Not everyone can do that on the same schedule. I'm opposed to trying to force that to happen. I very much favor cultivating a culture that wants someone to fix the problems in their code.
Sure, let's be inclusive, but also set a bar. If you make the time to write a patch, make the time to do it well (it does not take long to construct a test -- you have to make sure your code works properly anyhow).
But, my favorite workflow is a bit more chaotic, than that. People create their own DVCS versions of SciPy using their best judgment and publish revisions they consider to be working code.
Branches that are given the thumbs up by 2 people (or 1 on the steering committee) get pushed to the main branch. This review happens regularly, on IRC channels at regularly scheduled times.
Two eyes on every piece of code in SciPy, that's all we need. Two critical eyes that realise the value of tests and documentation. Your outline above fits in with my view of how this could happen.
Best, Matthew
Hi Matthew, On Tue, Feb 24, 2009 at 11:59 AM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
I've split this off into a new thread because I felt there were two issues in Stefan's original thread.
This is in the hope that we can stimulate discussion on the workflow (as opposed to - say - which version control system to use, or which bugtracker).
I would be very interested to see if we can come to a consensus on the important discussion of whether to introduce fairly formal code review into the scipy workflow. I've appended the key piece of discussion below.
2009/2/24 Travis E. Oliphant <oliphant@enthought.com>:
I think the biggest problem has been time and adding too formal of a process will just increase the time it takes to get code into SciPy. I'm fine with emphasizing documentation and tests as we discuss things and we should encourage each other, but I'm not comfortable with hard-line statements like the ones being made above. Yes, such things are helpful, but they are also expensive and I worry more about what we lose in contributions.
Having so little time means that we cannot be cavalier about adding broken code to SciPy. Like Matthew mentioned, this becomes an immense maintenance burden.
The quality of what we create should emerge as all interested parties critically look at the code that is available in SciPy.
I agree with that sentiment; and looking critically at code in SciPy starts with our own patches.
Not everyone can do that on the same schedule. I'm opposed to trying to force that to happen. I very much favor cultivating a culture that wants someone to fix the problems in their code.
Sure, let's be inclusive, but also set a bar. If you make the time to write a patch, make the time to do it well (it does not take long to construct a test -- you have to make sure your code works properly anyhow).
But, my favorite workflow is a bit more chaotic, than that. People create their own DVCS versions of SciPy using their best judgment and publish revisions they consider to be working code.
Branches that are given the thumbs up by 2 people (or 1 on the steering committee) get pushed to the main branch. This review happens regularly, on IRC channels at regularly scheduled times.
Two eyes on every piece of code in SciPy, that's all we need. Two critical eyes that realise the value of tests and documentation. Your outline above fits in with my view of how this could happen.
I don't think there are enough eyes at this point for a strict review policy. How many of the current packages have any maintainer? Who was maintaining the stats package before Josef got involved? How many folks besides Robert could look over the changes usefully? How many folks looked over Travis' recent addition to optimize? Who is working on the interpolation package? I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review. I don't have a problem with folks complaining about missing tests, etc., but I worry that if we put too many review steps into the submission path there won't be enough people to make it work. Chuck
On Tue, Feb 24, 2009 at 1:13 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I don't think there are enough eyes at this point for a strict review policy. How many of the current packages have any maintainer? Who was maintaining the stats package before Josef got involved? How many folks besides Robert could look over the changes usefully? How many folks looked over Travis' recent addition to optimize? Who is working on the interpolation package?
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review.
It doesn't seem your current process is conducive to have a "maintainer" for each package. What is the maintainer supposed to do? Monitor all the relevant SVN commits hoping that some broken, untested change doesn't go in behind his or her back? With a review process, maintainers emerge since code doesn't get included if they don't. You also get a lot more people knowledgeable about more areas of the codebase. I think Stefan's comment should be reiterated:
Having so little time means that we cannot be cavalier about adding broken code to SciPy. Like Matthew mentioned, this becomes an immense maintenance burden.
--Mike
On Tue, Feb 24, 2009 at 01:48:47PM -0800, Mike Hansen wrote:
It doesn't seem your current process is conducive to have a "maintainer" for each package. What is the maintainer supposed to do? Monitor all the relevant SVN commits hoping that some broken, untested change doesn't go in behind his or her back?
With a review process, maintainers emerge since code doesn't get included if they don't. You also get a lot more people knowledgeable about more areas of the codebase.
The problem is simply that you are lacking people. No more, no less. I say you because my contribution to scipy code has been nothing, although I try to be supportive of the project. I wonder how we go ahead and fix this problem. Tough question. What puzzles me is that there are plenty of people writing numerical code. In external packages, or in their in-house libraries. Why are these people not interested in putting their effort in something bigger, and thus probably longer-lived, puzzles me. Bah,.... Gaël
On Tue, Feb 24, 2009 at 2:48 PM, Mike Hansen <mhansen@gmail.com> wrote:
On Tue, Feb 24, 2009 at 1:13 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I don't think there are enough eyes at this point for a strict review policy. How many of the current packages have any maintainer? Who was maintaining the stats package before Josef got involved? How many folks besides Robert could look over the changes usefully? How many folks looked over Travis' recent addition to optimize? Who is working on the interpolation package?
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review.
It doesn't seem your current process is conducive to have a "maintainer" for each package. What is the maintainer supposed to do? Monitor all the relevant SVN commits hoping that some broken, untested change doesn't go in behind his or her back?
With a review process, maintainers emerge since code doesn't get included if they don't. You also get a lot more people knowledgeable about more areas of the codebase.
I think Stefan's comment should be reiterated:
Having so little time means that we cannot be cavalier about adding broken code to SciPy. Like Matthew mentioned, this becomes an immense maintenance burden.
Are we adding a lot of broken code? Does the gain offset the pain? I think we need more folks with commit privileges and interest. In the short term I would propose the following. 1) Additions get posted on the mailing list for comment before commit. I'll bet few knew of the additions to optimize. 2) We look for folks with patches in trac and consider giving them commit privileges to fix things up. 3) We put together a list of needed tests. Then we will see how serious folks are about writing tests. Chuck
On Tue, Feb 24, 2009 at 2:02 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Are we adding a lot of broken code? Does the gain offset the pain? I think we need more folks with commit privileges and interest. In the short term I would propose the following.
Many would argue that any untested code is broken code and will become an maintenance burden in the future.
1) Additions get posted on the mailing list for comment before commit. I'll bet few knew of the additions to optimize.
Does this really provide motivation for people to look at code?
2) We look for folks with patches in trac and consider giving them commit privileges to fix things up.
Ideally, you don't want anything committed until it's "fixed up". Using SVN makes this a bit more difficult to do.
3) We put together a list of needed tests. Then we will see how serious folks are about writing tests.
It's definitely important to know what you're actually testing and what you're not. Also, being able to do so in an automated way is important. http://ivory.idyll.org/blog/feb-09/people-who-dont-use-code-coverage-are-idi... --Mike
Charles, 2009/2/25 Charles R Harris <charlesr.harris@gmail.com>:
On Tue, Feb 24, 2009 at 1:13 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I don't think there are enough eyes at this point for a strict review policy. How many of the current packages have any maintainer? Who was maintaining the stats package before Josef got involved? How many folks besides Robert could look over the changes usefully? How many folks looked over Travis' recent addition to optimize? Who is working on the interpolation package?
Thank you for providing some prime examples on why code review and testing is needed. I didn't know about the changes to the optimisation module, but now I have to ask these questions: 1. Is it quality code, suitable for SciPy? (It is, I read the code, or in other words *reviewed* it) 2. Does it work? I don't know. Nobody does. There aren't any tests, no guarantees. That would be my reaction to one change, but there were 6 or more, none with any tests (and at least one contains a spelling mistake!). How do we know that they work under Windows, Solaris, Linux and OSX? Worse; what if I decide to make some updates to that code but, not having understood the author's intention perfectly, break it horribly. Who would be any the wiser? Tests protect the user and the developer alike. It is irresponsible to carry on the way we do.
Are we adding a lot of broken code?
Yes, we are. And it is OK to write broken code, we all do. My argument is that, together, we write better code (review), and by using the tools at our proposal (testing), we minimise the chances of failure.
Does the gain offset the pain? I think we need more folks with commit privileges and interest. In the short term I would propose the following.
More folks with commit priviledges would just perpetuate this chaos. Our community is sophisticated enough not to apply a brute-force solution to the problem.
1) Additions get posted on the mailing list for comment before commit. I'll bet few knew of the additions to optimize.
Simply being aware of a patch does not improve its quality.
3) We put together a list of needed tests. Then we will see how serious folks are about writing tests.
It should not be anyone's job to clean up after his/her peers. If each patch is accompanied by appropriate tests, this situation would never occur. Regards Stéfan
Stéfan van der Walt wrote:
Cha Thank you for providing some prime examples on why code review and testing is needed.
I didn't know about the changes to the optimisation module, but now I have to ask these questions:
1. Is it quality code, suitable for SciPy? (It is, I read the code, or in other words *reviewed* it)
2. Does it work? I don't know. Nobody does. There aren't any tests, no guarantees.
Which code has no tests? Are you speaking rhetorically or about a specific check-in?
That would be my reaction to one change, but there were 6 or more, none with any tests (and at least one contains a spelling mistake!). How do we know that they work under Windows, Solaris, Linux and OSX?
Worse; what if I decide to make some updates to that code but, not having understood the author's intention perfectly, break it horribly. Who would be any the wiser?
Tests protect the user and the developer alike. It is irresponsible to carry on the way we do.
No it's not. It's just a different, more organic, development model than the one you are championing. It's one where people do their best to create quality code and help fix the pieces that they care about, no matter who the original author was. Getting quality code is not as easy as establishing a formal review process. We've had a review process for years. *Everybody* is welcome to review the code that is checked in and complain / argue about better ways to do it. If you feel that strongly and have commit access, you can even change the code.
Are we adding a lot of broken code?
Yes, we are. There are *degrees* of broken-ness. It's impossible to prove that code is not "broken" in some way depending on your definition of broken. So, it's a bit misleading to complain about broken-ness.
We are also adding a lot of very useful code. I agree code gets better with more eye-balls, generally, but developers are also not interchangeable. I'm not at all convinced that a formal review process will actually make things better, when what we need is more time spent.
More folks with commit priviledges would just perpetuate this chaos. Our community is sophisticated enough not to apply a brute-force solution to the problem.
I'm not sure it actually would. I can see, however, that we do need a DVCS system to make the work-flow easier because branching is the key to having real development take place and allow the developer to utilize the benefits of version control while not impacting the "released-line" until they are ready to commit all changes (i.e. branching and merging needs to be as seamless as possible).
It should not be anyone's job to clean up after his/her peers. If each patch is accompanied by appropriate tests, this situation would never occur.
I think this is a fundamentally wrong mindset. Yes, we should all test before we commit code, and I'm not opposed to reminding people that the tests they already use to make sure things work can be easily placed as unit-tests. But, my best effort in an area which actually solves the problem, or scratches-the-itch I had, may look like "a mess" to someone else. And I don't think we can find a one-size fits-all solution to that fundamental difference of perspective. Who decides what "clean-up" constitutes? My answer is "those most interested" which is a dynamic and possibly varying group. If there are concerns about the commits people are making, then lets talk about specifics and address those rather than masking that behind a "review process" -Travis
Hi,
Tests protect the user and the developer alike. It is irresponsible to carry on the way we do.
No it's not.
Scipy is rarely released. David and Stefan are saying that it is very hard to release. It might be true, that continuing with the organic, 'add it if it seems good' approach, will be fine. But it might also be true that it will make Scipy grind to a halt, as it becomes too poorly structured and tested to maintain. http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab... rates Numpy / Scipy / matplotlib as 'immature'. This is mainly because of Scipy, and it's fair. It we want it to change we have to be able to release versions that have good documentation and low bug counts. The choices we make now are going to have long-lasting consequences for Scipy. I think our best guess, from what David and Stefan are saying, that we need a change towards more structured process. I stress the word "need". This doesn't seem surprising to me. I think we've got to listen to them, because they are doing the work of maintaining and releasing Scipy. See y'all, Matthew
On Wed, Feb 25, 2009 at 11:58 AM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
Tests protect the user and the developer alike. It is irresponsible to carry on the way we do.
No it's not.
Scipy is rarely released. David and Stefan are saying that it is very hard to release.
It might be true, that continuing with the organic, 'add it if it seems good' approach, will be fine. But it might also be true that it will make Scipy grind to a halt, as it becomes too poorly structured and tested to maintain.
http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab...
rates Numpy / Scipy / matplotlib as 'immature'. This is mainly because of Scipy, and it's fair. It we want it to change we have to be able to release versions that have good documentation and low bug counts.
The choices we make now are going to have long-lasting consequences for Scipy.
I think our best guess, from what David and Stefan are saying, that we need a change towards more structured process. I stress the word "need". This doesn't seem surprising to me. I think we've got to listen to them, because they are doing the work of maintaining and releasing Scipy.
Much of Scipy *isn't* maintained, that is why it is immature. There are parts that need to be worked over and rationalized and that isn't happening. You can't review code that hasn't been written. Some of that is history: the initial impetus in Scipy was interfacing existing C and Fortran libraries with Python and scratching itches. But that isn't the same as putting together a large package with smoothly interacting parts and verified results. And before that can happen we need more people working on the parts. Chuck
On Thu, Feb 26, 2009 at 4:22 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Wed, Feb 25, 2009 at 11:58 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
Tests protect the user and the developer alike. It is irresponsible to carry on the way we do.
No it's not.
Scipy is rarely released. David and Stefan are saying that it is very hard to release.
It might be true, that continuing with the organic, 'add it if it seems good' approach, will be fine. But it might also be true that it will make Scipy grind to a halt, as it becomes too poorly structured and tested to maintain.
http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab...
rates Numpy / Scipy / matplotlib as 'immature'. This is mainly because of Scipy, and it's fair. It we want it to change we have to be able to release versions that have good documentation and low bug counts.
The choices we make now are going to have long-lasting consequences for Scipy.
I think our best guess, from what David and Stefan are saying, that we need a change towards more structured process. I stress the word "need". This doesn't seem surprising to me. I think we've got to listen to them, because they are doing the work of maintaining and releasing Scipy.
Much of Scipy *isn't* maintained, that is why it is immature. There are parts that need to be worked over and rationalized and that isn't happening. You can't review code that hasn't been written. Some of that is history: the initial impetus in Scipy was interfacing existing C and Fortran libraries with Python and scratching itches. But that isn't the same as putting together a large package with smoothly interacting parts and verified results. And before that can happen we need more people working on the parts.
Also, if the problem is man power, adding more code which makes the whole package more difficult to handle does not sound like a future proof path. Unless the goal of scipy is to become a bag of tricks which may be useful to some people, without any commitment from our side. Some parts of scipy are difficult to maintain because they have no tests and no documentation - it is not even obvious what it is supposed to do. I am afraid we can't have it both ways: if we want to increase quality, given man power, we have to reduce the amount of code which requires constant attention. If we want more features first, then, we can continute like we do now. But then we can't expect constant releases, which are relatively well tested. David
2009/2/25 David Cournapeau <cournape@gmail.com>:
On Thu, Feb 26, 2009 at 4:22 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Wed, Feb 25, 2009 at 11:58 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
Tests protect the user and the developer alike. It is irresponsible to carry on the way we do.
No it's not.
Scipy is rarely released. David and Stefan are saying that it is very hard to release.
It might be true, that continuing with the organic, 'add it if it seems good' approach, will be fine. But it might also be true that it will make Scipy grind to a halt, as it becomes too poorly structured and tested to maintain. http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab...
rates Numpy / Scipy / matplotlib as 'immature'. This is mainly because of Scipy, and it's fair. It we want it to change we have to be able to release versions that have good documentation and low bug counts.
The choices we make now are going to have long-lasting consequences for Scipy.
I think our best guess, from what David and Stefan are saying, that we need a change towards more structured process. I stress the word "need". This doesn't seem surprising to me. I think we've got to listen to them, because they are doing the work of maintaining and releasing Scipy.
Much of Scipy *isn't* maintained, that is why it is immature. There are parts that need to be worked over and rationalized and that isn't happening. You can't review code that hasn't been written. Some of that is history: the initial impetus in Scipy was interfacing existing C and Fortran libraries with Python and scratching itches. But that isn't the same as putting together a large package with smoothly interacting parts and verified results. And before that can happen we need more people working on the parts.
Also, if the problem is man power, adding more code which makes the whole package more difficult to handle does not sound like a future proof path. Unless the goal of scipy is to become a bag of tricks which may be useful to some people, without any commitment from our side.
Some parts of scipy are difficult to maintain because they have no tests and no documentation - it is not even obvious what it is supposed to do. I am afraid we can't have it both ways: if we want to increase quality, given man power, we have to reduce the amount of code which requires constant attention. If we want more features first, then, we can continute like we do now. But then we can't expect constant releases, which are relatively well tested.
It seems to me that one reason for the current disagreement is that people are talking about two different things: (1) Getting new code written from scratch and into the repository, and (2) Getting (and keeping) the code we have working reliably. For (1), tests and documentation are indeed a barrier (albeit in my opinion a very low one). For (2), though, requiring tests and documentation will drastically decrease the effort required. Put another way: some people are arguing that not requiring tests or documentation will get more people contributing new code. Others are arguing that allowing code without tests or documentation into the trunk will increase the manpower required to do basic things like make releases. Personally, I don't think requiring tests and documentation is a barrier to new users. I was very hesitant about my first contribution because I really didn't want to put in broken or embarrassingly bad code, so the fact that I could test it systematically, confirm that it didn't break anything else, and document it clearly made me more confident that I was contributing something that wouldn't require me to wear a paper bag on my head. But let's assume that it is a barrier to contributing new code. Which does scipy need more right now: reliability in the code it has and a regular release cycle, or lots more new code? Anne
On Thu, Feb 26, 2009 at 3:58 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
Tests protect the user and the developer alike. It is irresponsible to carry on the way we do.
No it's not.
Scipy is rarely released. David and Stefan are saying that it is very hard to release.
It might be true, that continuing with the organic, 'add it if it seems good' approach, will be fine. But it might also be true that it will make Scipy grind to a halt, as it becomes too poorly structured and tested to maintain.
I personally don't buy much the argument that asking for some tests would mean less amount of useful code. True, not asking for tests at commit time requires less work, but having to fix afterwards some code wo tests and which I have not written is much more time consuming. If the code really is useful, but has no tests - maybe it does not belong to scipy. Maybe it belongs to something else - I mean, creating a setup.py + publishing on pypi is not really hard. Maybe we should first see how much easier we can make the process of including new code instead of worrying about making it too difficult. David
Hi Travis 2009/2/25 Travis E. Oliphant <oliphant@enthought.com>:
Which code has no tests? Are you speaking rhetorically or about a specific check-in?
I don't want to focus on any commits specifically (this kind of thing happens across the board), but I'll give one example that involves yourself. Take a look at http://projects.scipy.org/scipy/scipy/changeset/5554 You added a very useful function (thank you very much!). I haven't played with decimation filters recently, so I added a very naïve test: http://projects.scipy.org/scipy/scipy/changeset/5560 Let's face it, you could have done a much better job there than I could. Besides the neat function you added, you also fixed a spelling mistake. Unfortunately, "Filifilt" became "filtflit", which is also incorrect. Also, you accidentally changed "lfilter_zi" to "lfiltir_zi". I fixed that in: http://projects.scipy.org/scipy/scipy/changeset/5557 Even a cursory review of the patch would have caught these issues. I appreciate that you want SciPy to remain organic and free; a place where motivated developers strive to grow the codebase to an efficient, powerful computing machine. I share that vision, although I'd like us to turn the gears up a bit, oil the machine, and encourage one another to write better code. Regards Stéfan
On Wed, Feb 25, 2009 at 12:45 PM, Stéfan van der Walt <stefan@sun.ac.za>wrote:
Hi Travis
2009/2/25 Travis E. Oliphant <oliphant@enthought.com>:
Which code has no tests? Are you speaking rhetorically or about a specific check-in?
I don't want to focus on any commits specifically (this kind of thing happens across the board), but I'll give one example that involves yourself. Take a look at
I would have made some comments on the curve_fit function also, but there was no easy way to do it. I don't know that we need a formal review process at this point, but it would be nice if changes to the code showed up somewhere with tools that made it easy to add comments that went directly back to the submitter. I also miss that on the tickets, where some way to contact the submitter could sometimes be helpful. 'Course, we don't want the spam machines grabbing the addresses ;) Chuck
Charles R Harris wrote:
I would have made some comments on the curve_fit function also, but there was no easy way to do it. I don't know that we need a formal review process at this point, but it would be nice if changes to the code showed up somewhere with tools that made it easy to add comments that went directly back to the submitter. I also miss that on the tickets, where some way to contact the submitter could sometimes be helpful. 'Course, we don't want the spam machines grabbing the addresses ;)
Yes, such tools would be very nice, and I would welcome warmly any changes to the workflow that accomodate them. -Travis -- Travis Oliphant Enthought, Inc. (512) 536-1057 (office) (512) 536-1059 (fax) http://www.enthought.com oliphant@enthought.com
Hi,
I don't want to focus on any commits specifically (this kind of thing happens across the board), but I'll give one example that involves yourself.
And I will give an example that involves myself. I added a patch that was partly tested and not properly benchmarked to the 0.7 matlab io and rendered it more or less unusable for large datasets. I speak only for myself, but I like having people have a look at my code, teaching me stuff I don't know, or just checking things that I didn't think of myself. So, how about this: A proposal ------------- We set up a patch review policy. The review involves checking for and suggesting tests and documentation. That's the default. If you don't want this to happen to your code, then you ask for an opt-out. Does that sound reasonable? Matthew
2009/2/25 Matthew Brett <matthew.brett@gmail.com>:
Hi,
I don't want to focus on any commits specifically (this kind of thing happens across the board), but I'll give one example that involves yourself.
And I will give an example that involves myself. I added a patch that was partly tested and not properly benchmarked to the 0.7 matlab io and rendered it more or less unusable for large datasets.
I speak only for myself, but I like having people have a look at my code, teaching me stuff I don't know, or just checking things that I didn't think of myself.
So, how about this:
A proposal -------------
We set up a patch review policy. The review involves checking for and suggesting tests and documentation. That's the default. If you don't want this to happen to your code, then you ask for an opt-out.
Does that sound reasonable?
I think as far as patch review goes, opt-in is quite enough - what's lacking is a reasonable mechanism to implement patch review. Simply posting the code to the mailing list really doesn't work well. Possibly that online code review site people have pointed to a few times could work? It would help if I could avoid looking at patches I've already reviewed (or decided not to review). Here's somewhere infrastructure could help. Anne
Hi,
A proposal -------------
We set up a patch review policy. The review involves checking for and suggesting tests and documentation. That's the default. If you don't want this to happen to your code, then you ask for an opt-out.
I think as far as patch review goes, opt-in is quite enough
I would suggest - following Stefan's comments - that opt-out would encourage people to think of this as being the standard way to work. Opting-out would be saying 'this is not the way I like to work' - and that is OK.
- what's lacking is a reasonable mechanism to implement patch review.
Right. So, staying clear of the actual tools - can we ask Stefan, or David, or Pauli, to suggest a specific workflow along these lines? Then we can see how to implement it. Best, Matthew
Hi Matthew 2009/2/25 Matthew Brett <matthew.brett@gmail.com>:
Right. So, staying clear of the actual tools - can we ask Stefan, or David, or Pauli, to suggest a specific workflow along these lines? Then we can see how to implement it.
David and I have installed both roundup and trac 0.11 so far, and we are busy exploring the different code review options. We'll keep the list posted! Cheers Stéfan
And I will give an example that involves myself. I added a patch that was partly tested and not properly benchmarked to the 0.7 matlab io and rendered it more or less unusable for large datasets.
I speak only for myself, but I like having people have a look at my code, teaching me stuff I don't know, or just checking things that I didn't think of myself.
So, how about this:
A proposal -------------
We set up a patch review policy. The review involves checking for and suggesting tests and documentation. That's the default. If you don't want this to happen to your code, then you ask for an opt-out.
Who do you ask? Who decides whether or not you get one? I think it's better to have a recommended policy of review but not a requirement. Then, a tool like coverage that shows which code has been reviewed by more than one pair of eyes. And then let that information along with who committed the code and who are the people "curating" a particular sub-package guide the release manager. -Travis
Hi,
We set up a patch review policy. The review involves checking for and suggesting tests and documentation. That's the default. If you don't want this to happen to your code, then you ask for an opt-out.
Who do you ask?
The release manager, I suppose, or the several people who are doing it.
Who decides whether or not you get one?
Same, but, the threshold for someone who has already committed code should be low. It's clear that some people don't like working that way - and that's fine.
I think it's better to have a recommended policy of review but not a requirement.
Yes, right. So the recommendation is of the form 'this is our usual policy; if you would like to opt out of the policy, let us know, and expect us to agree'. See you, Matthew
On Tue, Feb 24, 2009 at 4:13 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review.
+1 This ought to be at the top of everyone's list. -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
Hi,
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review.
I would hesitate to make the model as strong as "ownership". Maybe "curator"? I don't mean to play with semantics but the choice of language for the model will be important in giving the right impression to new and/or timid users/contributors (myself included) who don't need to be put off getting involved because of perceived responsibilities. Ownership suggests a strict hierarchy, and potential curators will be less likely to get involved if the workflow model labels them "owners." Also, this perception also enables non-owners (who might perceive themselves as unqualified to help) to justify leaving the poor blighters to do everything by themselves. I don't want to have the responsibility of "owning" anything about the existing code for ODE solving (and maybe some other numerical methods), even though I have some stake in it. But I'll happily share in some of the reviewing and possibly testing of changes and improvements to that code. So, can't there be informal teams of curatorship so that not everyone involved has to be really familiar with the tools discussed in the other thread?! Unfortunately I cannot afford the time to ride the waves of changing fashion in VCS, etc. Wouldn't this help to get more people involved? ... those many people that Gael correctly assumes are out there but staying silent! -Rob
On Tue, Feb 24, 2009 at 6:18 PM, Rob Clewley <rob.clewley@gmail.com> wrote:
I would hesitate to make the model as strong as "ownership". Maybe "curator"? I don't mean to play with semantics but the choice of language for the model will be important in giving the right impression to new and/or timid users/contributors (myself included) who don't need to be put off getting involved because of perceived responsibilities. Ownership suggests a strict hierarchy, and potential curators will be less likely to get involved if the workflow model labels them "owners." Also, this perception also enables non-owners (who might perceive themselves as unqualified to help) to justify leaving the poor blighters to do everything by themselves.
I don't want to have the responsibility of "owning" anything about the existing code for ODE solving (and maybe some other numerical methods), even though I have some stake in it. But I'll happily share in some of the reviewing and possibly testing of changes and improvements to that code.
So, can't there be informal teams of curatorship so that not everyone involved has to be really familiar with the tools discussed in the other thread?! Unfortunately I cannot afford the time to ride the waves of changing fashion in VCS, etc.
Wouldn't this help to get more people involved? ... those many people that Gael correctly assumes are out there but staying silent!
I wouldn't get too hung-up on the word "owner". I think the necessary part is that one or more people feel some level of responsibility for each component of scipy. As an example, in Trac you can configure an "owner" (their term, not mine) for each component. When a ticket is issued the owner will receive an email. For simple fixes, this facilitates quick turn around. Ideally, each component would have at least one owner who was notified when tickets are issued. Here's the current list: Name Owner Build issues cdavid Other somebody Trac somebody Website somebody numexpr cookedm scipy.cluster somebody scipy.fftpack somebody scipy.integrate somebody scipy.interpolate somebody scipy.io somebody scipy.lib somebody scipy.linalg somebody scipy.maxentropy somebody scipy.misc somebody scipy.ndimage somebody scipy.odr cdavid scipy.optimize somebody scipy.signal somebody scipy.sparse wnbell scipy.sparse.linalg wnbell scipy.spatial peridot scipy.special somebody scipy.stats somebody scipy.weave somebody -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
On Tue, Feb 24, 2009 at 4:32 PM, Nathan Bell <wnbell@gmail.com> wrote:
On Tue, Feb 24, 2009 at 6:18 PM, Rob Clewley <rob.clewley@gmail.com> wrote:
I would hesitate to make the model as strong as "ownership". Maybe "curator"? I don't mean to play with semantics but the choice of language for the model will be important in giving the right impression to new and/or timid users/contributors (myself included) who don't need to be put off getting involved because of perceived responsibilities. Ownership suggests a strict hierarchy, and potential curators will be less likely to get involved if the workflow model labels them "owners." Also, this perception also enables non-owners (who might perceive themselves as unqualified to help) to justify leaving the poor blighters to do everything by themselves.
I don't want to have the responsibility of "owning" anything about the existing code for ODE solving (and maybe some other numerical methods), even though I have some stake in it. But I'll happily share in some of the reviewing and possibly testing of changes and improvements to that code.
So, can't there be informal teams of curatorship so that not everyone involved has to be really familiar with the tools discussed in the other thread?! Unfortunately I cannot afford the time to ride the waves of changing fashion in VCS, etc.
Wouldn't this help to get more people involved? ... those many people that Gael correctly assumes are out there but staying silent!
I wouldn't get too hung-up on the word "owner". I think the necessary part is that one or more people feel some level of responsibility for each component of scipy.
As an example, in Trac you can configure an "owner" (their term, not mine) for each component. When a ticket is issued the owner will receive an email. For simple fixes, this facilitates quick turn around. Ideally, each component would have at least one owner who was notified when tickets are issued.
Here's the current list:
Name Owner Build issues cdavid Other somebody Trac somebody Website somebody numexpr cookedm scipy.cluster somebody scipy.fftpack somebody scipy.integrate somebody scipy.interpolate somebody scipy.io somebody scipy.lib somebody scipy.linalg somebody scipy.maxentropy somebody scipy.misc somebody scipy.ndimage somebody scipy.odr cdavid scipy.optimize somebody scipy.signal somebody scipy.sparse wnbell scipy.sparse.linalg wnbell scipy.spatial peridot scipy.special somebody scipy.stats somebody scipy.weave somebody
Nice list. I note that Anne was quick to fix problems in scipy.spatial and David works hard on build issues. I feel responsible for the 1D solvers in scipy.optimize and the sort functions in Numpy. Josef should probably be added to the stats list. I nominate Pauli for special functions (Hi Pauli). A finer breakdown of categories might help in parceling out responsibilities. Volunteers? Chuck
Rob, 2009/2/25 Rob Clewley <rob.clewley@gmail.com>:
I don't want to have the responsibility of "owning" anything about the existing code for ODE solving (and maybe some other numerical methods), even though I have some stake in it. But I'll happily share in some of the reviewing and possibly testing of changes and improvements to that code.
I think you raise an important point. As an Open Source project, we could could succeed without much (any) formal hierarchy. Naturally, the system evolves into a kind of meritocracy, where capability and dedication counts heavily (as it should). Informal ownership (I care for this code, therefore I take note of its progress) is helpful in moderate doses. For example, I like the fact that Nathan reviews all patches related to sparse matrices; he knows that part of SciPy extremely well, and his advice is valuable. Of course, other reviews of such a patch would be just as welcome. The main argument I tried to put forth earlier was: Contributing to a project is easy when you know what is expected of you (clear guidelines), and when you know that you'll be treated on merit alone (the same as everybody else). Merit carries no malice, and is about as impartial as it gets. Looking forward to your input on ODEs! :) Cheers Stéfan
Stéfan van der Walt wrote:
Rob,
2009/2/25 Rob Clewley <rob.clewley@gmail.com>:
I don't want to have the responsibility of "owning" anything about the existing code for ODE solving (and maybe some other numerical methods), even though I have some stake in it. But I'll happily share in some of the reviewing and possibly testing of changes and improvements to that code.
I think you raise an important point.
As an Open Source project, we could could succeed without much (any) formal hierarchy. Naturally, the system evolves into a kind of meritocracy, where capability and dedication counts heavily (as it should).
Informal ownership (I care for this code, therefore I take note of its progress) is helpful in moderate doses. For example, I like the fact that Nathan reviews all patches related to sparse matrices; he knows that part of SciPy extremely well, and his advice is valuable. Of course, other reviews of such a patch would be just as welcome.
The main argument I tried to put forth earlier was:
Contributing to a project is easy when you know what is expected of you (clear guidelines), and when you know that you'll be treated on merit alone (the same as everybody else). Merit carries no malice, and is about as impartial as it gets.
I also want to point out that a formal code review process that is open (such as a web gui) encourages participation by people who may not feel they have the time or abilities to write new code, but would feel comfortable commenting on a patch sitting in front of them. I think it new developers could be fostered this way, too. Finally, while the prospect of having code go up for review won't encourage Travo to submit new code, it might have that effect on someone with less experience who is afraid that his/her new feature won't compile on Windows (for example). A formal code review process allows that person to put something online and say "don't apply as-is -- I'm looking for help integrating with Windows" or simply "I got this far, but I wonder how to do XXX". I realize these things are possible, to a degree, with Trac, but I agree that a better workflow process would be very useful. -Andrew
Andrew Straw wrote:
Stéfan van der Walt wrote:
Rob,
2009/2/25 Rob Clewley <rob.clewley@gmail.com>:
I don't want to have the responsibility of "owning" anything about the existing code for ODE solving (and maybe some other numerical methods), even though I have some stake in it. But I'll happily share in some of the reviewing and possibly testing of changes and improvements to that code.
I think you raise an important point.
As an Open Source project, we could could succeed without much (any) formal hierarchy. Naturally, the system evolves into a kind of meritocracy, where capability and dedication counts heavily (as it should).
Informal ownership (I care for this code, therefore I take note of its progress) is helpful in moderate doses. For example, I like the fact that Nathan reviews all patches related to sparse matrices; he knows that part of SciPy extremely well, and his advice is valuable. Of course, other reviews of such a patch would be just as welcome.
The main argument I tried to put forth earlier was:
Contributing to a project is easy when you know what is expected of you (clear guidelines), and when you know that you'll be treated on merit alone (the same as everybody else). Merit carries no malice, and is about as impartial as it gets.
I also want to point out that a formal code review process that is open (such as a web gui) encourages participation by people who may not feel they have the time or abilities to write new code, but would feel comfortable commenting on a patch sitting in front of them. I think it new developers could be fostered this way, too. Finally, while the prospect of having code go up for review won't encourage Travo to submit new code, it might have that effect on someone with less experience who is afraid that his/her new feature won't compile on Windows (for example). A formal code review process allows that person to put something online and say "don't apply as-is -- I'm looking for help integrating with Windows" or simply "I got this far, but I wonder how to do XXX". I realize these things are possible, to a degree, with Trac, but I agree that a better workflow process would be very useful.
This guarantee of a review made a huge difference in me being comfortable starting to contribute to Sage. I didn't have to get everything perfect; I knew someone would review my changes and offer suggestions. Another thing that I learned from the Sage project was that if you require it ("build it" :), they will come. It may take time, but requiring review of patches seems to pull people out of the woodwork to try reviewing the patches; people that would be too hesitant to actually write code for Sage, at least yet. It's a great, nonthreatening way to learn the system (there is often more than one review of a patch, especially if one of the reviewers is new to Sage). Just a few more reasons for a formal policy of reviews. Jason -- Jason Grout
Andrew Straw wrote:
I also want to point out that a formal code review process that is open (such as a web gui) encourages participation by people who may not feel they have the time or abilities to write new code, but would feel comfortable commenting on a patch sitting in front of them. I think it new developers could be fostered this way, too.
Great idea! Let's have review/mentoring processes to assist new-comers. I'm all for that. I would like to move those people who are timid at first to the point of being willing to dive in and get their hands dirty. I suspect my view is a bit organic for some, but I've encouraged people for a long time to commit code with as much documentation and testing as can be provided. Then, let the process of further documenting and using that code "harden" it rather than a "review" process. If we feel that there have been too many "buggy-commits" then what are examples of that? I think the switch to NumPy and the integration of ndimage did bring in some "less-reviewed" code with API changes that were possibly too hurried. But, that was a time-problem again. Is imposing an extra burden on the developer really going to solve that problem more than just a willingness to allow your own code to be critiqued and being willing to speak up when you see specifics you disagree with. I don't see this discussion as review or not review. Open source *will* be reviewed. It's just "when." On the question of whether or not you make the code available until you can guarantee someone else has looked at it, I come down on the side of "make it available" so that other people will look at it when it becomes interesting to them. Tools that let us monitor the results of commits (buildbots, dashboards, automatic emails, etc...) are much more valuable in my mind than (difficult-to-quantify and establish) processes that try to prevent any problems. More tools please is fundamentally what I say to the question of formal review... -Travis
2009/2/25 Travis E. Oliphant <oliphant@enthought.com>:
I don't see this discussion as review or not review. Open source *will* be reviewed. It's just "when." On the question of whether or not you make the code available until you can guarantee someone else has looked at it, I come down on the side of "make it available" so that other people will look at it when it becomes interesting to them.
Unfortunately, some of that "when" is "when the release manager is trying to get a the release candidate to compile on all architectures", which makes the release process quite onerous. Anne
On Thu, Feb 26, 2009 at 2:55 AM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Andrew Straw wrote:
I also want to point out that a formal code review process that is open (such as a web gui) encourages participation by people who may not feel they have the time or abilities to write new code, but would feel comfortable commenting on a patch sitting in front of them. I think it new developers could be fostered this way, too.
Great idea! Let's have review/mentoring processes to assist new-comers. I'm all for that.
I would like to move those people who are timid at first to the point of being willing to dive in and get their hands dirty. I suspect my view is a bit organic for some, but I've encouraged people for a long time to commit code with as much documentation and testing as can be provided. Then, let the process of further documenting and using that code "harden" it rather than a "review" process.
If we feel that there have been too many "buggy-commits" then what are examples of that? I think the switch to NumPy and the integration of ndimage did bring in some "less-reviewed" code with API changes that were possibly too hurried. But, that was a time-problem again. Is imposing an extra burden on the developer really going to solve that problem more than just a willingness to allow your own code to be critiqued and being willing to speak up when you see specifics you disagree with.
I don't see this discussion as review or not review. Open source *will* be reviewed. It's just "when." On the question of whether or not you make the code available until you can guarantee someone else has looked at it, I come down on the side of "make it available" so that other people will look at it when it becomes interesting to them.
Tools that let us monitor the results of commits (buildbots, dashboards, automatic emails, etc...) are much more valuable in my mind than (difficult-to-quantify and establish) processes that try to prevent any problems.
More tools please is fundamentally what I say to the question of formal review...
I agree. I know the topic is not about tools, but at least in my case, I don't do review because it is too much of a pain right now. It is not so much about tools as much as barrier of entry: I am honestly not really interested in doing code review if the process to be aware of the code to review, look at it, test it makes up 75 % of the time. And that's exactly the situation right now. More concretely, if I could: - receive email specifically for the packages/topics I want to review, not others - I can get the code/test it ideally without even having to click once in a website - send back what's wrong/what's not - mark a review as done That would make me much more willing to do review. I think discussing about review process in the dark, without actual experiments is a bit useless at this point - like Robert, I think as long as we don't have any process on how to do things, we won't have much more than gut feeling. To see actual problems, limitations, we have to try things at some point. David
Rob Clewley <rob.clewley <at> gmail.com> writes: [...]
So, can't there be informal teams of curatorship so that not everyone involved has to be really familiar with the tools discussed in the other thread?! Unfortunately I cannot afford the time to ride the waves of changing fashion in VCS, etc.
Wouldn't this help to get more people involved? ... those many people that Gael correctly assumes are out there but staying silent!
I am the kind of person that you want developing code for Scipy. I prove the existence of a non-empty class of people who are out here but stay silent (no longer!). I am a persistent lurker on these lists. I'm a heavy user of Numpy and Scipy in my research. I use Numpy and Scipy in the classes I teach. I contribute to other Python-based OSS projects in my small spare time. When you folks talk about attracting people to work on Scipy, I should be the kind of person you are thinking about (and I am legion?). I'd like to share some of my thoughts on the issues of code review, tests, documentation and workflow in the hopes of offering a non-insider perspective. 1) Code review is very helpful for me as a new contributor. I am much more likely to contribute in a context in which I feel that whatever code I *can* produce is going to be reviewed and I can work on it to bring it up to Scipy standards. If I feel that I have to produce picture-perfect Python on my first try, I am much less likely to try in the first place. Code review is a perfect place for interested people (me!) to learn how to be active people. It is also a positive-feedback loop, as other interested people see the mentoring process that someone else has gone through with code review and feel themselves up to the task of trying to contribute. For this reason, I think it is a benefit for code reviews to take place in public fora such as mailing lists, not exclusively in special code-review applications/domains. 2) Unit testing is also important for me as a new contributor. If I would like to mess around with something that I don't understand in order to learn something, unit testing allows me to experiment effectively. Without unit tests, I cannot be an effective experimentalist in my hacking. In addition, other projects have trained me to unit test my contributions, so that is what I would most likely be doing if I were to contribute and I would like to feel that my effort to write tests is valued. 3) Documenting code seems like a very important standard to uphold for new contributors. As someone who *might* contribute, I don't yet have a fixed notion of what is good enough code. So, if I do decide to send something up for public consumption, then I am easy to convince that I need to do more documentation. 4) Workflow and tools are extremely important for me as a new contributor. One of the things that keeps me from developing even small patches for Scipy is SVN. If I want to make a change, I have to check out the trunk and then develop my change *completely without the benefit of version control*. I am not allowed to make any intermediate commits while I learn my way through the coding process. I must submit a fully formed patch without ever being able to checkpoint my own progress. This is basically a deal-breaker for me. I don't enjoy coding without a safety net, especially large changes, especially test-driven changes and especially heavily documented changes. I want to be able to polish my patch using the power of version control. Not having this makes me enjoy scipy development less which makes me less likely to contribute. As a fairly early convert to DVCS, I am used to being able to use my local branch of the project however I need to in my own development process. Being able to commit to a local branch as I see fit also helps produce well-tested and well-documented code *and* enables effective multi-step code review. Particularly with Bazaar's bundle concept where the history of a local branch can be swapped via email (not just the patch), reviewers can merge a bundle from an email and review directly in the branch as I developed it. Their suggestions can then be incorporated into new revisions in my local branch, which can then be submitted again for more polishing. (I imagine git and Mercurial have similar lightweight capabilities for exchanging branches; I just don't have experience with them.) I hope that my thoughts help clarify this group's thinking about what sort of things can help bring in new contributors. (Oh, and I've got some ideas for scipy.stats ;) -Neil
On Wed, Feb 25, 2009 at 9:42 PM, Neil Martinsen-Burrell <nmb@wartburg.edu> wrote:
Rob Clewley <rob.clewley <at> gmail.com> writes:
[...]
So, can't there be informal teams of curatorship so that not everyone involved has to be really familiar with the tools discussed in the other thread?! Unfortunately I cannot afford the time to ride the waves of changing fashion in VCS, etc.
Wouldn't this help to get more people involved? ... those many people that Gael correctly assumes are out there but staying silent!
I am the kind of person that you want developing code for Scipy. I prove the existence of a non-empty class of people who are out here but stay silent (no longer!). I am a persistent lurker on these lists. I'm a heavy user of Numpy and Scipy in my research. I use Numpy and Scipy in the classes I teach. I contribute to other Python-based OSS projects in my small spare time. When you folks talk about attracting people to work on Scipy, I should be the kind of person you are thinking about (and I am legion?). I'd like to share some of my thoughts on the issues of code review, tests, documentation and workflow in the hopes of offering a non-insider perspective.
1) Code review is very helpful for me as a new contributor. I am much more likely to contribute in a context in which I feel that whatever code I *can* produce is going to be reviewed and I can work on it to bring it up to Scipy standards. If I feel that I have to produce picture-perfect Python on my first try, I am much less likely to try in the first place. Code review is a perfect place for interested people (me!) to learn how to be active people. It is also a positive-feedback loop, as other interested people see the mentoring process that someone else has gone through with code review and feel themselves up to the task of trying to contribute. For this reason, I think it is a benefit for code reviews to take place in public fora such as mailing lists, not exclusively in special code-review applications/domains.
2) Unit testing is also important for me as a new contributor. If I would like to mess around with something that I don't understand in order to learn something, unit testing allows me to experiment effectively. Without unit tests, I cannot be an effective experimentalist in my hacking. In addition, other projects have trained me to unit test my contributions, so that is what I would most likely be doing if I were to contribute and I would like to feel that my effort to write tests is valued.
3) Documenting code seems like a very important standard to uphold for new contributors. As someone who *might* contribute, I don't yet have a fixed notion of what is good enough code. So, if I do decide to send something up for public consumption, then I am easy to convince that I need to do more documentation.
4) Workflow and tools are extremely important for me as a new contributor. One of the things that keeps me from developing even small patches for Scipy is SVN. If I want to make a change, I have to check out the trunk and then develop my change *completely without the benefit of version control*. I am not allowed to make any intermediate commits while I learn my way through the coding process. I must submit a fully formed patch without ever being able to checkpoint my own progress. This is basically a deal-breaker for me. I don't enjoy coding without a safety net, especially large changes, especially test-driven changes and especially heavily documented changes. I want to be able to polish my patch using the power of version control. Not having this makes me enjoy scipy development less which makes me less likely to contribute.
As a fairly early convert to DVCS, I am used to being able to use my local branch of the project however I need to in my own development process. Being able to commit to a local branch as I see fit also helps produce well-tested and well-documented code *and* enables effective multi-step code review. Particularly with Bazaar's bundle concept where the history of a local branch can be swapped via email (not just the patch), reviewers can merge a bundle from an email and review directly in the branch as I developed it. Their suggestions can then be incorporated into new revisions in my local branch, which can then be submitted again for more polishing. (I imagine git and Mercurial have similar lightweight capabilities for exchanging branches; I just don't have experience with them.)
I hope that my thoughts help clarify this group's thinking about what sort of things can help bring in new contributors. (Oh, and I've got some ideas for scipy.stats ;)
Yes, +1 to all what you said. Also I agree with what Stefan said and I think most of the others sort of agree with this too. So I hope some change will happen soon in this direction. :) E.g. dvcs and peer review. I also agree with what Robert Kern said about the experimentalist in the corner --- I think let's just start peer review and see what happens. Make it easy for people like me to see what patches are waiting for review so that I can go through them and do the review (=making myself responsible for the patches if I say they are ok, or otherwise offer suggestions). Ondrej
Neil, Thanks for speaking up! I think there are *many* people in your situation, including myself - I too am mostly a silent watcher of SciPy and I would be much more likely to contribute if the things you list were a part of the Scipy development culture: * Tests * Code review * Documentation * Good tools and workflow. I think it is an unproven myth that these things are "barriers" for people who want to write code. In most cases that I have seen, these things *encourage* new people to contribute to a project and greatly improve the quality of the code being written by newbies and veteran's alike. Cheers, Brian
I am the kind of person that you want developing code for Scipy. I prove the existence of a non-empty class of people who are out here but stay silent (no longer!). I am a persistent lurker on these lists. I'm a heavy user of Numpy and Scipy in my research. I use Numpy and Scipy in the classes I teach. I contribute to other Python-based OSS projects in my small spare time. When you folks talk about attracting people to work on Scipy, I should be the kind of person you are thinking about (and I am legion?). I'd like to share some of my thoughts on the issues of code review, tests, documentation and workflow in the hopes of offering a non-insider perspective.
1) Code review is very helpful for me as a new contributor. I am much more likely to contribute in a context in which I feel that whatever code I *can* produce is going to be reviewed and I can work on it to bring it up to Scipy standards. If I feel that I have to produce picture-perfect Python on my first try, I am much less likely to try in the first place. Code review is a perfect place for interested people (me!) to learn how to be active people. It is also a positive-feedback loop, as other interested people see the mentoring process that someone else has gone through with code review and feel themselves up to the task of trying to contribute. For this reason, I think it is a benefit for code reviews to take place in public fora such as mailing lists, not exclusively in special code-review applications/domains.
2) Unit testing is also important for me as a new contributor. If I would like to mess around with something that I don't understand in order to learn something, unit testing allows me to experiment effectively. Without unit tests, I cannot be an effective experimentalist in my hacking. In addition, other projects have trained me to unit test my contributions, so that is what I would most likely be doing if I were to contribute and I would like to feel that my effort to write tests is valued.
3) Documenting code seems like a very important standard to uphold for new contributors. As someone who *might* contribute, I don't yet have a fixed notion of what is good enough code. So, if I do decide to send something up for public consumption, then I am easy to convince that I need to do more documentation.
4) Workflow and tools are extremely important for me as a new contributor. One of the things that keeps me from developing even small patches for Scipy is SVN. If I want to make a change, I have to check out the trunk and then develop my change *completely without the benefit of version control*. I am not allowed to make any intermediate commits while I learn my way through the coding process. I must submit a fully formed patch without ever being able to checkpoint my own progress. This is basically a deal-breaker for me. I don't enjoy coding without a safety net, especially large changes, especially test-driven changes and especially heavily documented changes. I want to be able to polish my patch using the power of version control. Not having this makes me enjoy scipy development less which makes me less likely to contribute.
As a fairly early convert to DVCS, I am used to being able to use my local branch of the project however I need to in my own development process. Being able to commit to a local branch as I see fit also helps produce well-tested and well-documented code *and* enables effective multi-step code review. Particularly with Bazaar's bundle concept where the history of a local branch can be swapped via email (not just the patch), reviewers can merge a bundle from an email and review directly in the branch as I developed it. Their suggestions can then be incorporated into new revisions in my local branch, which can then be submitted again for more polishing. (I imagine git and Mercurial have similar lightweight capabilities for exchanging branches; I just don't have experience with them.)
I hope that my thoughts help clarify this group's thinking about what sort of things can help bring in new contributors. (Oh, and I've got some ideas for scipy.stats ;)
-Neil
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
On Wed, Feb 25, 2009 at 11:21 PM, Brian Granger <ellisonbg.net@gmail.com> wrote:
Neil,
Thanks for speaking up! I think there are *many* people in your situation, including myself - I too am mostly a silent watcher of SciPy and I would be much more likely to contribute if the things you list were a part of the Scipy development culture:
* Tests * Code review * Documentation * Good tools and workflow.
I think it is an unproven myth that these things are "barriers" for people who want to write code. In most cases that I have seen, these things *encourage* new people to contribute to a project and greatly improve the quality of the code being written by newbies and veteran's alike.
Yep, I count myself in that group too. Ondrej
Brian Granger wrote:
Neil,
Thanks for speaking up! I think there are *many* people in your situation, including myself - I too am mostly a silent watcher of SciPy and I would be much more likely to contribute if the things you list were a part of the Scipy development culture:
* Tests * Code review * Documentation
What is standing in the way of these things being done more often? Tests are happening, code review is happening, and documentation is happening. What exactly is the problem except lack of time from people who have historically committed to SciPy? I want these things to happen and try to do them whenever I submit new code. But, time is a factor, and people will disagree about "what constitutes a test" and "what constitutes good documentation." There is a lot of code in SciPy that was contributed by me very early on that may not live up to the same standard of testing and documentation that people have. Is that the fundamental problem --- history?
* Good tools and workflow.
This, I do see as a problem. The value of DVCS is that it handles branches better and new contributors *want* to work on branches and this will encourage *everybody* (including me) to work on branches. I've been committing to trunk for many years on SciPy and NumPy without pre-commit code review. It's hard for me to break that habit and I'm resisting requests that I stop doing that. I'm willing to stop it -- if it will really help SciPy progress. However, I'm not convinced that this kind of commit behavior by me and others is what is really stalling SciPy development. It could be that other commit patterns are not advertised enough to assist newcomers and I'm all for advertising other commit patterns that help people contribute. So, let's do that advertising. I'm very encouraged by the experiments with a git-svn bridge and would love to see an issue-tracker with a command-line interface. I'd also love to see more volunteers who are willing to tackle release management so that we can do it more regularly. Ultimately, those who do the work of release management will define what SciPy *is*. Best regards, -Travis
<snip> Several lurkers have expressed an interest in working on scipy. How can we get them involved in working with the code? We need these people. Chuck
* Tests * Code review * Documentation
What is standing in the way of these things being done more often? Tests are happening, code review is happening, and documentation is happening. What exactly is the problem except lack of time from people who have historically committed to SciPy?
Lack of time *can* be an issue for veteran developers. But for eager new developers this is not typically the problem. For these people, you simply need to make it perfectly clear how they can contribute to the project. The more specific we can be the better. As a semi-outsider, here is what I *perceive* the Scipy model to be currently (for new users): * Checkout the SVN trunk. * Make your changes. * Contact the list and tell them about your new code. * Then ??? This procedure is summarized on the scipy development page in the following language: "Interested people can get repository write access as well" AND "If you have some new code you'd like to see included in SciPy, the first thing to do is make a SciKit" Here is what I would like to see = this would make me wan't to contribute code to scipy: Contributing to SciPy is easy and anyone can do it. Here is what you do: * Create a branch using git/bzr/hg (we have to pick one). * Write your code. * Add tests and documentation to your code. * Run the test suite * Post your branch to github/launchpad/bitbucket * Submit your branch for review using... When someone is eager to write code, this is what they need!!!
I want these things to happen and try to do them whenever I submit new code. But, time is a factor, and people will disagree about "what constitutes a test" and "what constitutes good documentation."
There is a lot of code in SciPy that was contributed by me very early on that may not live up to the same standard of testing and documentation that people have. Is that the fundamental problem --- history?
No project can escape its history. But for an eager developer, there is really no difference (from a workflow/testing/review perspective) between writing new code and improving old code.
* Good tools and workflow.
This, I do see as a problem. The value of DVCS is that it handles branches better and new contributors *want* to work on branches and this will encourage *everybody* (including me) to work on branches.
+1
I've been committing to trunk for many years on SciPy and NumPy without pre-commit code review. It's hard for me to break that habit and I'm resisting requests that I stop doing that. I'm willing to stop it --
I agree that this is a hard habit to break.
2009/2/26 Brian Granger <ellisonbg.net@gmail.com>:
Contributing to SciPy is easy and anyone can do it. Here is what you do:
* Create a branch using git/bzr/hg (we have to pick one). * Write your code. * Add tests and documentation to your code. * Run the test suite * Post your branch to github/launchpad/bitbucket * Submit your branch for review using...
I'll have a first draft of a developer's guide ready in the morning. Regards Stéfan
I'll have a first draft of a developer's guide ready in the morning.
Fantastic! Can this: http://www.scipy.org/Developer_Zone page be updated to reflect the changes. Cheers, Brian
On Thu, Feb 26, 2009 at 2:07 PM, Stéfan van der Walt <stefan@sun.ac.za>wrote:
2009/2/26 Brian Granger <ellisonbg.net@gmail.com>:
Contributing to SciPy is easy and anyone can do it. Here is what you do:
* Create a branch using git/bzr/hg (we have to pick one). * Write your code. * Add tests and documentation to your code. * Run the test suite * Post your branch to github/launchpad/bitbucket * Submit your branch for review using...
I'll have a first draft of a developer's guide ready in the morning.
Getting the buildbots to support this procedure will also help with the testing. Chuck
Thu, 26 Feb 2009 23:07:14 +0200, Stéfan van der Walt wrote:
2009/2/26 Brian Granger <ellisonbg.net@gmail.com>:
Contributing to SciPy is easy and anyone can do it. Here is what you do:
* Create a branch using git/bzr/hg (we have to pick one). * Write your code. * Add tests and documentation to your code. * Run the test suite * Post your branch to github/launchpad/bitbucket * Submit your branch for review using...
I'll have a first draft of a developer's guide ready in the morning.
If you want to discuss Git, you can probably steal from here: http://scipy.org/scipy/numpy/wiki/GitMirror -- Pauli Virtanen
2009/2/27 Pauli Virtanen <pav@iki.fi>:
If you want to discuss Git, you can probably steal from here:
Ah, yes, good reminder! Could you give me a quick rundown of why you used --mirror earlier on when adding the remote? Thanks! Stéfan
Fri, 27 Feb 2009 00:48:00 +0200, Stéfan van der Walt wrote:
2009/2/27 Pauli Virtanen <pav@iki.fi>:
If you want to discuss Git, you can probably steal from here:
Ah, yes, good reminder!
Could you give me a quick rundown of why you used --mirror earlier on when adding the remote?
The --mirror option adds fetch = +refs/*:refs/* mirror = yes to [remote "origin"]. So one wouldn't need to edit .git/config manually. However, the --mirror has another effect which I missed earlier: it makes the remote consider all heads its own, so that "git remote prune origin" would drop all branches, including local ones. Similar issue with "git fetch". So I think it's not the correct solution. *** But all of that is moot now. I finally figured out that I must push to the mirror with git push git@github.com:pv/numpy-svn.git \ +refs/remotes/*:refs/heads/* +master Then it can be cloned simply with git clone --origin svn git://github.com/pv/scipy-svn.git And "--origin svn" only because we want svn/trunk instead of origin/trunk. Also git-svn can be activated: git svn init -s --prefix=svn/ http://svn.scipy.org/svn/scipy git svn rebase -l And as a bonus, the SVN branches are visible on Github! I'll update the GitMirror page. -- Pauli Virtanen
On Fri, Feb 27, 2009 at 9:08 AM, Pauli Virtanen <pav@iki.fi> wrote:
Fri, 27 Feb 2009 00:48:00 +0200, Stéfan van der Walt wrote:
2009/2/27 Pauli Virtanen <pav@iki.fi>:
If you want to discuss Git, you can probably steal from here:
Ah, yes, good reminder!
Could you give me a quick rundown of why you used --mirror earlier on when adding the remote?
The --mirror option adds
fetch = +refs/*:refs/* mirror = yes
to [remote "origin"]. So one wouldn't need to edit .git/config manually.
However, the --mirror has another effect which I missed earlier: it makes the remote consider all heads its own, so that "git remote prune origin" would drop all branches, including local ones. Similar issue with "git fetch". So I think it's not the correct solution.
***
But all of that is moot now. I finally figured out that I must push to the mirror with
git push git@github.com:pv/numpy-svn.git \ +refs/remotes/*:refs/heads/* +master
Then it can be cloned simply with
git clone --origin svn git://github.com/pv/scipy-svn.git
And "--origin svn" only because we want svn/trunk instead of origin/trunk. Also git-svn can be activated:
git svn init -s --prefix=svn/ http://svn.scipy.org/svn/scipy git svn rebase -l
And as a bonus, the SVN branches are visible on Github!
Ah, nice, I did not find a way to do this - I used a dirty script to get local branches and update them instead. One thing which is still annoying is that tags are considered as branches - but I guess there is no way around it, since svn does not have any tag concept. David
On Thu, Feb 26, 2009 at 8:23 PM, David Cournapeau <cournape@gmail.com> wrote:
On Fri, Feb 27, 2009 at 9:08 AM, Pauli Virtanen <pav@iki.fi> wrote:
Fri, 27 Feb 2009 00:48:00 +0200, Stéfan van der Walt wrote:
2009/2/27 Pauli Virtanen <pav@iki.fi>:
If you want to discuss Git, you can probably steal from here:
Ah, yes, good reminder!
Could you give me a quick rundown of why you used --mirror earlier on when adding the remote?
The --mirror option adds
fetch = +refs/*:refs/* mirror = yes
to [remote "origin"]. So one wouldn't need to edit .git/config manually.
However, the --mirror has another effect which I missed earlier: it makes the remote consider all heads its own, so that "git remote prune origin" would drop all branches, including local ones. Similar issue with "git fetch". So I think it's not the correct solution.
***
But all of that is moot now. I finally figured out that I must push to the mirror with
git push git@github.com:pv/numpy-svn.git \ +refs/remotes/*:refs/heads/* +master
Then it can be cloned simply with
git clone --origin svn git://github.com/pv/scipy-svn.git
And "--origin svn" only because we want svn/trunk instead of origin/trunk. Also git-svn can be activated:
git svn init -s --prefix=svn/ http://svn.scipy.org/svn/scipy git svn rebase -l
And as a bonus, the SVN branches are visible on Github!
Ah, nice, I did not find a way to do this - I used a dirty script to get local branches and update them instead. One thing which is still annoying is that tags are considered as branches - but I guess there is no way around it, since svn does not have any tag concept.
Btw, I guess you already know it, but if you need to clone the git repository (for example David's) and then you would like to update it using git-svn with the latest svn from scipy, here is the howto: http://subtlegradient.com/articles/2008/04/22/cloning-a-git-svn-clone E.g. basically: git svn init http://macromates.com/svn/Bundles/trunk/Bundles/Ruby.tmbundle -R svn cp .git/refs/remotes/origin/master .git/refs/remotes/git-svn git svn fetch Ondrej
Thu, 26 Feb 2009 20:54:21 -0800, Ondrej Certik wrote: [clip]
Btw, I guess you already know it, but if you need to clone the git repository (for example David's) and then you would like to update it using git-svn with the latest svn from scipy, here is the howto:
http://subtlegradient.com/articles/2008/04/22/cloning-a-git-svn-clone
Yes, instructions for this are available also on the page http://scipy.org/scipy/numpy/wiki/GitMirror But if the mirror is up-to-date (and I hope we manage to get the SVN post-commit hook installed), there's no need to do this, you can just ''git fetch''. -- Pauli Virtanen
just 1 tiny comment... probably it was raised already, then I beg your pardon for its reincarnation http://projects.scipy.org/numpy/wiki/GitMirror advices to use git clone --origin svn http://projects.scipy.org/git/scipy.git scipy.git but there is a convention to use .git suffix for '--bare' repositories. So I would advice to use scipy-git or scipy_git or just scipy ;) On Fri, 27 Feb 2009, Pauli Virtanen wrote:
Yes, instructions for this are available also on the page
But if the mirror is up-to-date (and I hope we manage to get the SVN post-commit hook installed), there's no need to do this, you can just ''git fetch''. -- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555]
On Sat, Mar 14, 2009 at 4:53 AM, Yaroslav Halchenko <lists@onerussian.com> wrote:
but there is a convention to use .git suffix for '--bare' repositories. So I would advice to use scipy-git or scipy_git or just scipy ;)
IIRC, the .git suffix is automatically added by github. cheers, David
Sat, 14 Mar 2009 12:45:55 +0900, David Cournapeau wrote:
On Sat, Mar 14, 2009 at 4:53 AM, Yaroslav Halchenko <lists@onerussian.com> wrote:
but there is a convention to use .git suffix for '--bare' repositories. So I would advice to use scipy-git or scipy_git or just scipy ;)
IIRC, the .git suffix is automatically added by github.
I guess he's talking about the name of the cloned directory here. -- Pauli Virtanen
On Sat, 14 Mar 2009, Pauli Virtanen wrote:
but there is a convention to use .git suffix for '--bare' repositories. So I would advice to use scipy-git or scipy_git or just scipy ;) IIRC, the .git suffix is automatically added by github. I guess he's talking about the name of the cloned directory here. yeap ;)
-- .-. =------------------------------ /v\ ----------------------------= Keep in touch // \\ (yoh@|www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^-^^ [175555]
Hi David (& other prospective git-svn users), I just ran into a speedbump with git-svn and using the Git mirror: http://projects.scipy.org/numpy/wiki/GitMirror#Abigfatwordofwarning In short: apparently git-svn does not automatically track SVN commits appearing elsewhere than locally via `git-svn fetch/rebase`. So if you want to `dcommit` after `git fetch`ing from the mirror or from someone else, the database of git-svn needs to be rebuilt: rm -rf .git/svn git svn rebase -l Otherwise, the `dcommit` will shove bogus changesets to SVN. (Ouch!) -- Pauli Virtanen
Pauli Virtanen wrote:
Hi David (& other prospective git-svn users),
I just ran into a speedbump with git-svn and using the Git mirror:
http://projects.scipy.org/numpy/wiki/GitMirror#Abigfatwordofwarning
In short: apparently git-svn does not automatically track SVN commits appearing elsewhere than locally via `git-svn fetch/rebase`. So if you want to `dcommit` after `git fetch`ing from the mirror or from someone else, the database of git-svn needs to be rebuilt:
rm -rf .git/svn git svn rebase -l
Otherwise, the `dcommit` will shove bogus changesets to SVN. (Ouch!)
Hm, do you have an example of that ? It never happened to me. My typical usage is when I start working on numpy is: git svn fetch --all git co master && git svn rebase -l git co -b line_of_work When I use dcommit, it first rebase my changes on top of svn last revision if the last svn revision differs from the on I have locally. cheers, David
Sun, 08 Mar 2009 17:31:24 +0900, David Cournapeau wrote:
Pauli Virtanen wrote:
Hi David (& other prospective git-svn users),
I just ran into a speedbump with git-svn and using the Git mirror:
http://projects.scipy.org/numpy/wiki/GitMirror#Abigfatwordofwarning
In short: apparently git-svn does not automatically track SVN commits appearing elsewhere than locally via `git-svn fetch/rebase`. So if you want to `dcommit` after `git fetch`ing from the mirror or from someone else, the database of git-svn needs to be rebuilt:
rm -rf .git/svn git svn rebase -l
Otherwise, the `dcommit` will shove bogus changesets to SVN. (Ouch!)
Hm, do you have an example of that ? It never happened to me. My typical usage is when I start working on numpy is:
git svn fetch --all git co master && git svn rebase -l git co -b line_of_work
When I use dcommit, it first rebase my changes on top of svn last revision if the last svn revision differs from the on I have locally.
It doesn't occur if you stick to the usual git-svn workflow of getting SVN commits via `git svn fetch/rebase` only. An example where it occurs is git fetch mirror # fetch branch from mirror or from someone else git rebase svn/trunk # rebase on it git svn dcommit -n # now try to dcommit Now git-svn thinks the current branch still corresponds to the old version, and uses that as the base for `dcommit`. However, doing `git svn rebase` does not fix the situation, since `git-svn` also thinks that the current branch is based on the latest commit, and is so up-to-date. I'll also note that if you do merges and then `dcommit` the result, this makes the tree generated by `git-svn` diverge from the one in the mirror (since in the mirror the merge commit has only one parent, but in the private tree it has two -- `git-svn fetch` doesn't track merges). -- Pauli Virtanen
On Sun, Mar 8, 2009 at 11:04 PM, Pauli Virtanen <pav@iki.fi> wrote:
Sun, 08 Mar 2009 17:31:24 +0900, David Cournapeau wrote:
Pauli Virtanen wrote:
Hi David (& other prospective git-svn users),
I just ran into a speedbump with git-svn and using the Git mirror:
http://projects.scipy.org/numpy/wiki/GitMirror#Abigfatwordofwarning
In short: apparently git-svn does not automatically track SVN commits appearing elsewhere than locally via `git-svn fetch/rebase`. So if you want to `dcommit` after `git fetch`ing from the mirror or from someone else, the database of git-svn needs to be rebuilt:
rm -rf .git/svn git svn rebase -l
Otherwise, the `dcommit` will shove bogus changesets to SVN. (Ouch!)
Hm, do you have an example of that ? It never happened to me. My typical usage is when I start working on numpy is:
git svn fetch --all git co master && git svn rebase -l git co -b line_of_work
When I use dcommit, it first rebase my changes on top of svn last revision if the last svn revision differs from the on I have locally.
It doesn't occur if you stick to the usual git-svn workflow of getting SVN commits via `git svn fetch/rebase` only. An example where it occurs is
git fetch mirror # fetch branch from mirror or from someone else git rebase svn/trunk # rebase on it git svn dcommit -n # now try to dcommit
Ah, yes, you should definitely stick to one and only one mirror. That's a git-svn limitation I think. I guess I was careful because of my experience with previous conversion tools (in bzr). I think the problem is linked to guaranteeing that a given commit with the same content will be recognized as such. In git proper, it is by design if the history is the same. In bzr and other systems, it has to be explicit because a commit does not depend only on the content (but on meta-data as well). That's what they call deterministic on the BzrMigration page: http://bazaar-vcs.org/BzrMigration If I look at my git-svn import and yours, the commit sha1 are not the same for the corresponding svn revision. As such, I don't see how it is possible to guarantee consistency with multiple mirrors. cheers, David
Mon, 09 Mar 2009 16:23:24 +0900, David Cournapeau wrote: [clip]
It doesn't occur if you stick to the usual git-svn workflow of getting SVN commits via `git svn fetch/rebase` only. An example where it occurs is
git fetch mirror # fetch branch from mirror or from someone else git rebase svn/trunk # rebase on it git svn dcommit -n # now try to dcommit
Ah, yes, you should definitely stick to one and only one mirror. That's a git-svn limitation I think.
This is a different issue, I believe: the commits are exactly the same, hashes match etc., but git-svn's caching just gets confused. [clip]
If I look at my git-svn import and yours, the commit sha1 are not the same for the corresponding svn revision. As such, I don't see how it is possible to guarantee consistency with multiple mirrors.
This is because in your history, git-svn has made one of the preceding commits a merge commit in `git svn rebase`. This information can't of course be reconstructed from SVN. -- Pauli Virtanen
On Thu, Feb 26, 2009 at 1:40 PM, Brian Granger <ellisonbg.net@gmail.com>wrote:
* Tests * Code review * Documentation
What is standing in the way of these things being done more often? Tests are happening, code review is happening, and documentation is happening. What exactly is the problem except lack of time from people who have historically committed to SciPy?
Lack of time *can* be an issue for veteran developers. But for eager new developers this is not typically the problem. For these people, you simply need to make it perfectly clear how they can contribute to the project. The more specific we can be the better.
As a semi-outsider, here is what I *perceive* the Scipy model to be currently (for new users):
* Checkout the SVN trunk. * Make your changes. * Contact the list and tell them about your new code. * Then ???
This procedure is summarized on the scipy development page in the following language:
"Interested people can get repository write access as well"
AND
"If you have some new code you'd like to see included in SciPy, the first thing to do is make a SciKit"
Here is what I would like to see = this would make me wan't to contribute code to scipy:
Contributing to SciPy is easy and anyone can do it. Here is what you do:
* Create a branch using git/bzr/hg (we have to pick one). * Write your code. * Add tests and documentation to your code. * Run the test suite * Post your branch to github/launchpad/bitbucket * Submit your branch for review using...
When someone is eager to write code, this is what they need!!!
Nice, I like it. +2. Chuck
Contributing to SciPy is easy and anyone can do it. Here is what you do:
* Create a branch using git/bzr/hg (we have to pick one). * Write your code. * Add tests and documentation to your code. * Run the test suite * Post your branch to github/launchpad/bitbucket * Submit your branch for review using...
When someone is eager to write code, this is what they need!!!
For bug fixes and small changes to scipy and if someone wants to submit some tests, this is a pretty bit of work, if you don't have the infrastructure set up for it. For example to be able to upload to a branch on launchpad, I was struggling several hours with the ssl authorization which was screwed up in my bazar install. When I am looking and using some open source project and I want to report some problems and propose some fixes, I don't want to have to install a new revision control system such as git, create accounts at several websites, just to report some bugfixes as in the story of Alex. To require a complete rebuild of scipy and creating branches for changing several lines of code and adding some tests, seems a lot to demand for someone that doesn't already have the particular full development environment installed. So while the decentralized version control will help those developers that want to get more strongly involved, we shouldn't make it sound like that is the only way to contribute. On the other hand, it is clear that the more work it is for committing and reviewing developers the slower will be a response, especially for parts of scipy that doesn't have a "maintainer". Josef
On Thu, Feb 26, 2009 at 16:09, <josef.pktd@gmail.com> wrote:
Contributing to SciPy is easy and anyone can do it. Here is what you do:
* Create a branch using git/bzr/hg (we have to pick one). * Write your code. * Add tests and documentation to your code. * Run the test suite * Post your branch to github/launchpad/bitbucket * Submit your branch for review using...
When someone is eager to write code, this is what they need!!!
For bug fixes and small changes to scipy and if someone wants to submit some tests, this is a pretty bit of work, if you don't have the infrastructure set up for it.
For example to be able to upload to a branch on launchpad, I was struggling several hours with the ssl authorization which was screwed up in my bazar install. When I am looking and using some open source project and I want to report some problems and propose some fixes, I don't want to have to install a new revision control system such as git, create accounts at several websites, just to report some bugfixes as in the story of Alex. To require a complete rebuild of scipy and creating branches for changing several lines of code and adding some tests, seems a lot to demand for someone that doesn't already have the particular full development environment installed.
So while the decentralized version control will help those developers that want to get more strongly involved, we shouldn't make it sound like that is the only way to contribute. On the other hand, it is clear that the more work it is for committing and reviewing developers the slower will be a response, especially for parts of scipy that doesn't have a "maintainer".
The lighterweight version is * Check out the source. * Write your code and as much docs and tests as you can. * Use <foo> to upload your patch to the review site. Making and publishing a branch is for people who do this regularly. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
The lighterweight version is
* Check out the source. * Write your code and as much docs and tests as you can. * Use <foo> to upload your patch to the review site.
Making and publishing a branch is for people who do this regularly.
Yes, the workflow should support people who want to write code but don't know how to or want to make and publish a branch. But I think we *should* encourage people to make branches (if a DVCS is used) simply because it gives them local version control and allows more flexibility for how the patches are created (for example rebasing). Brian
On Thu, Feb 26, 2009 at 5:42 AM, Neil Martinsen-Burrell <nmb@wartburg.edu> wrote:
Rob Clewley <rob.clewley <at> gmail.com> writes:
[...]
So, can't there be informal teams of curatorship so that not everyone involved has to be really familiar with the tools discussed in the other thread?! Unfortunately I cannot afford the time to ride the waves of changing fashion in VCS, etc.
Wouldn't this help to get more people involved? ... those many people that Gael correctly assumes are out there but staying silent!
I am the kind of person that you want developing code for Scipy. I prove the existence of a non-empty class of people who are out here but stay silent (no longer!). I am a persistent lurker on these lists. I'm a heavy user of Numpy and Scipy in my research. I use Numpy and Scipy in the classes I teach. I contribute to other Python-based OSS projects in my small spare time. When you folks talk about attracting people to work on Scipy, I should be the kind of person you are thinking about (and I am legion?). I'd like to share some of my thoughts on the issues of code review, tests, documentation and workflow in the hopes of offering a non-insider perspective.
1) Code review is very helpful for me as a new contributor. I am much more likely to contribute in a context in which I feel that whatever code I *can* produce is going to be reviewed and I can work on it to bring it up to Scipy standards. If I feel that I have to produce picture-perfect Python on my first try, I am much less likely to try in the first place. Code review is a perfect place for interested people (me!) to learn how to be active people. It is also a positive-feedback loop, as other interested people see the mentoring process that someone else has gone through with code review and feel themselves up to the task of trying to contribute. For this reason, I think it is a benefit for code reviews to take place in public fora such as mailing lists, not exclusively in special code-review applications/domains.
2) Unit testing is also important for me as a new contributor. If I would like to mess around with something that I don't understand in order to learn something, unit testing allows me to experiment effectively. Without unit tests, I cannot be an effective experimentalist in my hacking. In addition, other projects have trained me to unit test my contributions, so that is what I would most likely be doing if I were to contribute and I would like to feel that my effort to write tests is valued.
3) Documenting code seems like a very important standard to uphold for new contributors. As someone who *might* contribute, I don't yet have a fixed notion of what is good enough code. So, if I do decide to send something up for public consumption, then I am easy to convince that I need to do more documentation.
4) Workflow and tools are extremely important for me as a new contributor. One of the things that keeps me from developing even small patches for Scipy is SVN. If I want to make a change, I have to check out the trunk and then develop my change *completely without the benefit of version control*. I am not allowed to make any intermediate commits while I learn my way through the coding process. I must submit a fully formed patch without ever being able to checkpoint my own progress. This is basically a deal-breaker for me. I don't enjoy coding without a safety net, especially large changes, especially test-driven changes and especially heavily documented changes. I want to be able to polish my patch using the power of version control. Not having this makes me enjoy scipy development less which makes me less likely to contribute.
As a fairly early convert to DVCS, I am used to being able to use my local branch of the project however I need to in my own development process. Being able to commit to a local branch as I see fit also helps produce well-tested and well-documented code *and* enables effective multi-step code review. Particularly with Bazaar's bundle concept where the history of a local branch can be swapped via email (not just the patch), reviewers can merge a bundle from an email and review directly in the branch as I developed it. Their suggestions can then be incorporated into new revisions in my local branch, which can then be submitted again for more polishing. (I imagine git and Mercurial have similar lightweight capabilities for exchanging branches; I just don't have experience with them.)
I hope that my thoughts help clarify this group's thinking about what sort of things can help bring in new contributors. (Oh, and I've got some ideas for scipy.stats ;)
-Neil - Show quoted text -
As another long time lurker I would also support everything Neil said. I also wanted to add the point, that what stops me recommending scipy more widely to my colleagues is not that there is not enough code in it - it is that it is not stable enough to rely on for their work. That is perhaps a bit harsh, but I am sure that the first time one of my colleagues lost 1/2 a day because of a scipy bug (as I have done quite a few times) they would be back to MATLAB. So I would agree with Stefan and the others that the priority is not getting more code in per se, but improving the quality and frequency of releases to get a platform whose stability compares with MATLAB before adding more stuff. Cheers Robin
On Thu, Feb 26, 2009 at 3:38 AM, Robin <robince@gmail.com> wrote:
As another long time lurker I would also support everything Neil said.
I also wanted to add the point, that what stops me recommending scipy more widely to my colleagues is not that there is not enough code in it - it is that it is not stable enough to rely on for their work. That is perhaps a bit harsh, but I am sure that the first time one of my colleagues lost 1/2 a day because of a scipy bug (as I have done quite a few times) they would be back to MATLAB.
So I would agree with Stefan and the others that the priority is not getting more code in per se, but improving the quality and frequency of releases to get a platform whose stability compares with MATLAB before adding more stuff.
I think we are not seeing enough trac tickets about missing test with tests included as patches. For a user, that is familiar with the a part of scipy, it would be relatively easy to provide a test. This would reduce the chance that parts get broken by accident and signal in any refactoring that the interface should be change only with proper depreciation warning. So the users could contribute to scipy and making it more stable for their own work. Similarly, when I was working my way through parts of scipy, I found that examples (or tests that can be used as examples) are missing. This makes it often difficult to figure out what the exact format of the call parameters and limitation of the functions are. Example: signal.ltisys: no tests, no examples, good general description For someone not familiar with the matlab signal toolbox, it is not clear what the exact requirements for the matrices of the state space representation is. But for users of this, it might be much easier to come up with examples and tests than for me, who has to work trough the exceptions that are raised and the source code. I also agree, with Neil. This is exactly the situation I was in, half a year ago. Before, getting commit access I had several local copies of files and finally a bzr branch to keep track of my changes. A more systematic workflow for this would a big improvement. But for rewriting relatively confined parts (which is most of stats but may not apply to other parts), I still prefer to work with stand-alone scripts (under my own local version control), and integrate them into scipy when they are ready. The review of my changes by Per Brodtkorb was very helpful. However, my main quality control was to increase the test coverage for stats.distributions from around 50% to above 90%, with statistical tests that made sure that the numbers are at least approximately correct. (up to statistical noise and numerical precision.) Since I was also relatively new to numpy, I might not have coded everything in the most efficient way, but at least I felt relatively sure that each change I made passed the basic (statistical) tests. And I'm still reluctant to apply any bug fixes without full verification and testing. This slows down the bug fixing and enhancements but lowers the chance that we introduce new bugs. High test coverage would also make it easier to apply new patches or enhancements since we don't have to wait for the next round of bug reports to verify that everything still works. I think once scipy has a reasonable test coverage, the development and release process would go quite a bit faster Using nose testing is a huge improvement in the testing workflow. And I wish that we see lots of trac tickets with patches for missing tests. Josef
On Thu, Feb 26, 2009 at 10:18:49AM -0500, josef.pktd@gmail.com wrote:
On Thu, Feb 26, 2009 at 3:38 AM, Robin <robince@gmail.com> wrote: I think we are not seeing enough trac tickets about missing test with tests included as patches.
I think one of the issues is that a lot of people don't really know what tests are, how to write them, and how to run them. In addition, they don't really master version control, and as someone pointed out in the discussion they might not know of to make a patch. I few years ago, this was the case for me. A document on 'how to contribute to scipy' that explains the workflow would probably help a lot. The sympy guys have done this, and I know people on the sympy mailing list are often pointed to this document. I think the core part of such a document could actually be common to many projects. I with add on my TODO list to write one for Mayavi (in sphinx), because the is the project I know best, and we can try and port it to Scipy once it it done. I won't get to doing this anytime soon (I am super busy currently), so if someone wants to beat me in doing this for scipy, just go ahead! Gaël
As another long time lurker I would also support everything Neil said.
I also wanted to add the point, that what stops me recommending scipy more widely to my colleagues is not that there is not enough code in it - it is that it is not stable enough to rely on for their work. That is perhaps a bit harsh, but I am sure that the first time one of my colleagues lost 1/2 a day because of a scipy bug (as I have done quite a few times) they would be back to MATLAB.
So I would agree with Stefan and the others that the priority is not getting more code in per se, but improving the quality and frequency of releases to get a platform whose stability compares with MATLAB before adding more stuff.
+1
On Feb 26, 2009, at 12:42 AM, Neil Martinsen-Burrell wrote:
other projects have trained me to unit test my contributions, so that is what I would most likely be doing if I were to contribute and I would like to feel that my effort to write tests is valued.
I've never seen a single post in this thread, or on this list for that matter, that indicated that anybody thought that the effort to write tests was not valued. Quite the contrary. Everybody wants tests; it's just a question of whether there's something else they want even more. If I may very unfairly summarize the debate this far: Stéfan, et al.: There aren't enough tests. All code must have tests! Under penalty of death! Travis, et al.: But then we we shall have neither code *nor* tests! Stéfan, et al.: Good! I would characterize the debate as the value of writing tests *relative* to the value of writing anything else, not whether you should be writing and using tests at all. Absolutely you should.
One of the things that keeps me from developing even small patches for Scipy is SVN. If I want to make a change, I have to check out the trunk and then develop my change *completely without the benefit of version control*.
Nothing forces you to develop without version control. See the SVN Book's discussion on "vendor branches" for one approach. We hack on several other people's tools in our own repository and periodically synch to their efforts or send them patches from ours. I am *not* saying that DVCS is a bad idea. I don't think it is. In this instance, it's clearly superior to the svn model, but please let's stop making svn out to be worse than it is, because I guarantee that you're going to be bitten by some variant of the same issues with bzr, git, or whatever. Look at David Cournapeau's blog on why he's ditching bzr in favor of git, after ditching svn in favor of bzr; I look forward to his essay on the atrocities of git. I don't mean that to be snide; it's *good* that tools are getting better and it's *good* that people like David are actually comparing them head to head and telling us about their experiences. I get that good and pleasant tools make people more productive, e.g., while I can get work done with Windows, I can get a lot more work done with something else because I don't have to devote so much energy to profanity. If some DVCS will make life more pleasant for both release managers and new contributors like yourself, by all means go for it, but don't kid yourself about the strengths of the tool you're leaving behind or the weaknesses of the tool you're adopting.
Jonathan Guyer wrote:
Nothing forces you to develop without version control. See the SVN Book's discussion on "vendor branches" for one approach. We hack on several other people's tools in our own repository and periodically synch to their efforts or send them patches from ours. I am *not* saying that DVCS is a bad idea. I don't think it is. In this instance, it's clearly superior to the svn model, but please let's stop making svn out to be worse than it is, because I guarantee that you're going to be bitten by some variant of the same issues with bzr, git, or whatever. Look at David Cournapeau's blog on why he's ditching bzr in favor of git, after ditching svn in favor of bzr;
You got it wrong: I've never ditched svn, since I have never used it for my own projects. David
On Feb 26, 2009, at 10:59 AM, David Cournapeau wrote:
Jonathan Guyer wrote:
Look at David Cournapeau's blog on why he's ditching bzr in favor of git, after ditching svn in favor of bzr;
You got it wrong: I've never ditched svn, since I have never used it for my own projects.
Fair enough. Let me amend: "Look at David Cournapeau's blog on why he's ditching bzr in favor of git, after making an ad hominem attack on svn". Better?
On Fri, Feb 27, 2009 at 1:20 AM, Jonathan Guyer <guyer@nist.gov> wrote:
On Feb 26, 2009, at 10:59 AM, David Cournapeau wrote:
Jonathan Guyer wrote:
Look at David Cournapeau's blog on why he's ditching bzr in favor of git, after ditching svn in favor of bzr;
You got it wrong: I've never ditched svn, since I have never used it for my own projects.
Fair enough. Let me amend: "Look at David Cournapeau's blog on why he's ditching bzr in favor of git, after making an ad hominem attack on svn". Better?
Not really. I know svn relatively well, and I think my svn complaints are valid. They may not be enough to make a change for numpy/scipy, but I notice that Stefan, Pauli and me, who account for maybe 50 % of the commits in the last 6 months, all use git-svn, even though we have commit rights. David
On Thu, Feb 26, 2009 at 11:00:32AM -0500, Jonathan Guyer wrote:
e.g., while I can get work done with Windows, I can get a lot more work done with something else because I don't have to devote so much energy to profanity.
Fantastic!! I love this sentence. Gaël
On 2/26/2009 5:04 PM, Gael Varoquaux wrote:
On Thu, Feb 26, 2009 at 11:00:32AM -0500, Jonathan Guyer wrote:
e.g., while I can get work done with Windows, I can get a lot more work done with something else because I don't have to devote so much energy to profanity.
Fantastic!! I love this sentence.
On Linux, every call to malloc must be matched with a call to free. On Windows, millions of calls to HeapAlloc can be matched with a single call to HeapDestroy. Windows prevents impossible-to-find memory leaks, and preserves my sanity when working with complex data structures. I really like Windows. There is even software and hardware drivers for it. Linux may be free as in freedom, but as Kris Kristofferson noted some 30 years ago: "Freedom's just another word for nothing left to loose. Nothing ain't worth nothing but it's free." Sturla Molden
2009/2/26 Jonathan Guyer <guyer@nist.gov>:
If I may very unfairly summarize the debate this far:
Stéfan, et al.: There aren't enough tests. All code must have tests! Under penalty of death! Travis, et al.: But then we we shall have neither code *nor* tests! Stéfan, et al.: Good!
I agree, your summary isn't accurate: the world isn't that black and white. To summarise a profound story I heard the other day: "A wise man first dams up the river, before trying to empty the lake." Unless we test new code, how do we make time to clean up the old code? I certainly would not like to see SciPy stagnate because of hard-line policy. My fear is, that unless we do something in order to protect our scarce developer resources, we will see the same thing happening. Regards Stéfan
On Thu, Feb 26, 2009 at 9:23 AM, Stéfan van der Walt <stefan@sun.ac.za>wrote:
2009/2/26 Jonathan Guyer <guyer@nist.gov>:
If I may very unfairly summarize the debate this far:
Stéfan, et al.: There aren't enough tests. All code must have tests! Under penalty of death! Travis, et al.: But then we we shall have neither code *nor* tests! Stéfan, et al.: Good!
I agree, your summary isn't accurate: the world isn't that black and white. To summarise a profound story I heard the other day:
"A wise man first dams up the river, before trying to empty the lake."
Unless we test new code, how do we make time to clean up the old code?
I certainly would not like to see SciPy stagnate because of hard-line policy. My fear is, that unless we do something in order to protect our scarce developer resources, we will see the same thing happening.
People, people, let's not kindle the flames. Or at least wait till I get the hotdogs and marshmellows. Chuck
Stéfan van der Walt wrote:
2009/2/26 Jonathan Guyer <guyer@nist.gov>:
Hi,
If I may very unfairly summarize the debate this far:
Stéfan, et al.: There aren't enough tests. All code must have tests! Under penalty of death! Travis, et al.: But then we we shall have neither code *nor* tests! Stéfan, et al.: Good!
I agree, your summary isn't accurate: the world isn't that black and white. To summarise a profound story I heard the other day:
"A wise man first dams up the river, before trying to empty the lake."
Unless we test new code, how do we make time to clean up the old code?
I certainly would not like to see SciPy stagnate because of hard-line policy. My fear is, that unless we do something in order to protect our scarce developer resources, we will see the same thing happening.
Well, I have been following this thread so far and abstained from commenting except for one small email, but a couple more comments from the peanut gallery knowing too well that most people see things differently to the conclusion I will reach at the end: 1. If code is not tested it is broken. It might not be broken on your machine, but how about different endianess or pointer sizes? How about different compilers, BLAS backends, etc. 2. Tests illustrate how to use the code and that makes code more accessible for a person who is not the author to use as well as improve the code. And good coverage also lessens the risk that changes on one end of the codebase break things in other places. I believe the two points above are more or less agreed upon in the discussion here so far by everybody (probably not the "untested code is broken bit"), the main dividing issue is whether imposing "too much" or "too restrictive" process will hurt Scipy more than it will help. Tests can also be used to do other things, i.e. in Sage we use doctests to automatically look for leaks as well as to look for speed regression. The speed regression code is not yet automatically run, but due to experiencing slowdowns over and over again it is a high priority thing to get the needed infrastructure bits finished. Now the rather controversial point: In Sage we do not want your contribution if you do not have 100% coverage. There is code sitting in trac where the person has been told to get coverage up to standard, otherwise the code will not be merged. Obviously if the code is good and just missing some coverage the reviewer is often willing to add the missing bits to get the code up to standard. But we will not merge code that isn't fully tested or has known failures. The decision to mandate review and coverage was made after a not too long discussion in person at a Sage Days in September 2007 and it took a while to convince the last person that this was a good idea. We lost one developer who was not willing to write doctests for his code and all I can say is "good riddance". It wasn't a particularly important piece of code he was working on and I believe that also other external factors made that person drop out from the Sage project. Losing a person might seem bad and it certainly is not a good thing, but while the policy of mandatory coverage and then later on of mandatory review caused pain and required adjustments to the way we did things it has shown great term long term benefits. Review does not catch every issue, but it sure as hell catches a lot before the code goes even in. And once things have been merged they are in and unless someone takes the time right there and then to review the issues will likely just melt into the background. If you look into Sage into areas with low coverage you will find code that is in need of a lot of attention. That code was merged into Sage back in the day before review and coverage requirements and it clearly shows. The argument that some people make that people can go back and do review once the code is in certainly holds, the problem is that there are often more pressing issues to do right now than to deal with problems one could deal with later [the code is already in tree:)] and so very little happens about that code there and then unless it is truly bad. The comment about "damming the river" certainly fits very well here. We can and do release Sage version which are generally better than the previous release with much higher frequency than just about any other project out there. This is largely possible because of coverage and review. At one point we had some critical bugs that popped up at a rather important installation and William and I went from the decision to make a Sage release to having a handful of bugs fixed and tested that they worked to the final release tarball in less than 12 hours. And one more thing about trivial and obvious fixes not needing review: Those are some times the worst and hardest bugs to fix in the right way. Too often people have attempted to "fix" things in Sage not understanding the underlying implementation details and so on. We have been burned by "trivial" one line fixes often enough to develop a lot of skepticism that without testing anything can be verified to be fixed. One example here is one of the bugs that originally lead to the policy decision to mandated the coverage policy: We had a bug in an external library where a characteristic polynomial for a matrix over the integers was wrong. * First try: Fix the issue in the library and update it in Sage Oops, forgot the fix, *nobody* noticed since no doctest in Sage was added that did check for the bug, let's redo it. * Second try: Fix the issue in the library and update it in Sage, add doctest Testing the build revealed that everything worked fine. A couple days later we cut an rc release, but behold: doctesting reveals that on a big-endian PPC box the fix in the upstream library was *completely* broken. So the third try was the charm. Without mandated tests for fixes we would have caught this bug at some point for sure, but it would have likely been way more work to determine which change caused the problem since the side effects of broken code in that specific case can be at far places in the codebase. I am not asking anyone to agree with me, I just wanted to point out things the Sage project has been doing and share some experience I have collected as the Sage release manager. There is no one-size fits all for a project and project specific culture as well as customs are hard to change, but it can be done. I believe that the Sage project has greatly benefited from the policy changes described above. Now you need to come to your own conclusions :)
Regards Stéfan
Cheers, Michael
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
Michael Abshoff wrote:
I am not asking anyone to agree with me, I just wanted to point out things the Sage project has been doing and share some experience I have collected as the Sage release manager. There is no one-size fits all for a project and project specific culture as well as customs are hard to change, but it can be done. I believe that the Sage project has greatly benefited from the policy changes described above. Now you need to come to your own conclusions :)
Thank you for the detailed examples and for sharing your experiences. This kind of feedback is very valuable. One of the things I've admired about Sage is how much raw time people seem to devote to writing code / building web-pages / and writing documentation / tests. Having someone with as much energy as William seems to me to make a big difference. That was my experience with the NumPy port as well. Things move forward when there is someone spending a lot of time paying attention and pushing. *Just* the spare time of people causes projects to move forward more slowly. -Travis
On Thu, Feb 26, 2009 at 11:55 AM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Michael Abshoff wrote:
I am not asking anyone to agree with me, I just wanted to point out things the Sage project has been doing and share some experience I have collected as the Sage release manager. There is no one-size fits all for a project and project specific culture as well as customs are hard to change, but it can be done. I believe that the Sage project has greatly benefited from the policy changes described above. Now you need to come to your own conclusions :)
Thank you for the detailed examples and for sharing your experiences. This kind of feedback is very valuable.
One of the things I've admired about Sage is how much raw time people seem to devote to writing code / building web-pages / and writing documentation / tests.
Having someone with as much energy as William seems to me to make a big difference. That was my experience with the NumPy port as well. Things move forward when there is someone spending a lot of time paying attention and pushing. *Just* the spare time of people causes projects to move forward more slowly.
Yep, I have exactly the same experience with sympy --- if I push things hard day and night, people will join and we grow, if it's just the spare time of other people, things move forward more slowly. Ondrej
On Tue, Feb 24, 2009 at 15:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review.
"Ownership" has a bad failure mode. Case in point: nominally, I am the "owner" of scipy.stats and numpy.random and completely failed to move Josef's patches along. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Tue, Feb 24, 2009 at 6:28 PM, Robert Kern <robert.kern@gmail.com> wrote:
"Ownership" has a bad failure mode. Case in point: nominally, I am the "owner" of scipy.stats and numpy.random and completely failed to move Josef's patches along.
Does Josef not now "own" certain parts of scipy.stats? /case in point :) -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
2009/2/24 Robert Kern <robert.kern@gmail.com>:
On Tue, Feb 24, 2009 at 15:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review.
"Ownership" has a bad failure mode. Case in point: nominally, I am the "owner" of scipy.stats and numpy.random and completely failed to move Josef's patches along.
It seems to me that scipy's development model is a classic open-source "scratch an itch": it bothered me that people were forever asking questions that needed spatial data structures, so I took a weekend and wrote some. I don't foresee this changing without some major change (e.g. a company suddenly hiring ten people to work full-time on scipy). So the question is how to make this model produce reliable code. Suggestions people have made to accomplish this: (1) Don't allow anything into SVN without tests and documentation. (2) Make sure everything gets reviewed before it goes in. (3) Appoint owners for parts of scipy. Of these, I strongly approve of (1). It's really not a barrier. Writing tests is easy. Every programmer does *some* testing (well maybe not Knuth, but everybody else) to make sure the code does what it's supposed to. Writing these tests in nose-compatible form really isn't hard. Documentation is more of an obstacle, just because it's extra work. But I think it's not too much to ask. (2) I'm not so sure of. For an example, a few days ago I fixed a couple of spatial bugs. In both cases, the bug fix was a one-line change to scipy proper, plus a unit test that would have caught the bug but now passes. What would be gained by waiting until somebody else got around to looking at those fixes before committing them? I am tempted to suggest a weaker standard: optional code review. If you want to submit a piece of code to scipy and don't have SVN access, or do but want someone else to take a look at it (as, e.g., I did for scipy.spatial as a whole), post it; people can review it and when it's been adequately reviewed it goes in. Of course, here we return to infrastructure: as far as I know we don't have any reasonable tool for doing these reviews, or for connecting them to bug reports. (3) I am highly dubious of. Certainly we'll have informal owners - I fixed the bugs in spatial in part because I wrote the code and was embarrassed to see it broken. I know the spatial code pretty well, so I will probably have an easier time assessing patches to it. But I am often busy - if those spatial bugs had been reported a month earlier I would not have been able to get to them any sooner. Making it my fault if patches don't get in to scipy.spatial - which is, really, what we're talking about - is a recipe for driving people like me away from developing scipy. Don't do it. Anne
On Tue, Feb 24, 2009 at 7:20 PM, Anne Archibald <peridot.faceted@gmail.com> wrote:
(3) I am highly dubious of. Certainly we'll have informal owners - I fixed the bugs in spatial in part because I wrote the code and was embarrassed to see it broken. I know the spatial code pretty well, so I will probably have an easier time assessing patches to it.
IMO that's all "owner" implies. I don't think anyone can seriously expect more of a largely volunteer effort. The word "stakeholder" seems popular nowadays, should we use that instead? The aversion to "owner" must be a cultural thing :) -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
On Tue, Feb 24, 2009 at 5:20 PM, Anne Archibald <peridot.faceted@gmail.com>wrote:
On Tue, Feb 24, 2009 at 15:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the
2009/2/24 Robert Kern <robert.kern@gmail.com>: person
who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review.
"Ownership" has a bad failure mode. Case in point: nominally, I am the "owner" of scipy.stats and numpy.random and completely failed to move Josef's patches along.
It seems to me that scipy's development model is a classic open-source "scratch an itch": it bothered me that people were forever asking questions that needed spatial data structures, so I took a weekend and wrote some. I don't foresee this changing without some major change (e.g. a company suddenly hiring ten people to work full-time on scipy). So the question is how to make this model produce reliable code.
Suggestions people have made to accomplish this:
(1) Don't allow anything into SVN without tests and documentation. (2) Make sure everything gets reviewed before it goes in. (3) Appoint owners for parts of scipy.
Of these, I strongly approve of (1). It's really not a barrier. Writing tests is easy. Every programmer does *some* testing (well maybe not Knuth, but everybody else) to make sure the code does what it's supposed to. Writing these tests in nose-compatible form really isn't hard. Documentation is more of an obstacle, just because it's extra work. But I think it's not too much to ask.
(2) I'm not so sure of. For an example, a few days ago I fixed a couple of spatial bugs. In both cases, the bug fix was a one-line change to scipy proper, plus a unit test that would have caught the bug but now passes. What would be gained by waiting until somebody else got around to looking at those fixes before committing them?
I am tempted to suggest a weaker standard: optional code review. If you want to submit a piece of code to scipy and don't have SVN access, or do but want someone else to take a look at it (as, e.g., I did for scipy.spatial as a whole), post it; people can review it and when it's been adequately reviewed it goes in. Of course, here we return to infrastructure: as far as I know we don't have any reasonable tool for doing these reviews, or for connecting them to bug reports.
(3) I am highly dubious of. Certainly we'll have informal owners - I fixed the bugs in spatial in part because I wrote the code and was embarrassed to see it broken. I know the spatial code pretty well, so I will probably have an easier time assessing patches to it. But I am often busy - if those spatial bugs had been reported a month earlier I would not have been able to get to them any sooner. Making it my fault if patches don't get in to scipy.spatial - which is, really, what we're talking about - is a recipe for driving people like me away from developing scipy. Don't do it.
I don't think that's what we are "really talking about", rather, I think we need folks who feel an informal ownership about parts of scipy. I simply pointed out where I felt responsible as an example. Your sense of "owning" scipy.spatial is another example. And I think the best way to get folks attached to orphaned bits of code that have languished untouched all these years is to let them make actual changes without jumping through umpteen legal hoops. I also think we need more developers, and the place to find them is among folks who have contributed patches. We should actively offer commit privileges to such folks. The main advantage of a DVCS in such a situation is that commit privilege becomes less important and additions can be reviewed offline and brought in easily when ready. But until we have such a system I think more folks need the ability to touch SVN. Chuck
On Tue, Feb 24, 2009 at 7:20 PM, Anne Archibald <peridot.faceted@gmail.com> wrote:
2009/2/24 Robert Kern <robert.kern@gmail.com>:
On Tue, Feb 24, 2009 at 15:13, Charles R Harris <charlesr.harris@gmail.com> wrote:
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review.
"Ownership" has a bad failure mode. Case in point: nominally, I am the "owner" of scipy.stats and numpy.random and completely failed to move Josef's patches along.
It seems to me that scipy's development model is a classic open-source "scratch an itch": it bothered me that people were forever asking questions that needed spatial data structures, so I took a weekend and wrote some. I don't foresee this changing without some major change (e.g. a company suddenly hiring ten people to work full-time on scipy). So the question is how to make this model produce reliable code.
Suggestions people have made to accomplish this:
(1) Don't allow anything into SVN without tests and documentation. (2) Make sure everything gets reviewed before it goes in. (3) Appoint owners for parts of scipy.
I think that having someone who feels responsible for the different parts of scipy is the main problem. And whatever we do to make this easier and that expands the number of active participants will be an improvement. I don't feel like the "owner" of stats, but it's more a case of adoption. I like the centralized trac timeline since it is easy to monitor new tickets and changes to svn. And I'm doing code review ex-post (after commits) to minimize new problems. This is also an incentive to increase test coverage to complain immediately if something breaks. (My main problem with trac was monitoring old tickets, which I haven't figured out how to do efficiently.) I think for packages that have a responsible and responsive "maintainer" my experience with the mailing list was pretty good. On the other hand looking at the mailing list history, I saw many comments and threads about the problems in stats, and while some problems got fixed, many reports of problems were never followed by any action. Which is also pretty frustrating for the user. A new workflow and code review might help, but if there is nobody, that adopts the orphaned subpackages, it will be just another place to store comments. I'm a huge fan of full test coverage, but writing full verified tests is for me a lot of work and I still have a backlog of bugfixes because I haven't had time to write sufficient tests. Also, I think that the commitment to maintain and increase test coverage should be sufficient for some cases. For example, in stats.mstats Pierre rewrote and added statistics functions for masked arrays, the test coverage is good, but there are still quite a few functions not covered and still some rough edges, but overall it looks in better condition than scipy.stats did. In this case I find it useful to have the full set of functions, that Pierre wrote, available immediately than adding them piecemeal as he finds time to write tests. The documentation editor is a good example where an easier access by new contributors increased the number of participants, and maybe collective writing and review of code and tests can lower the entry barrier. But for now, I think, I still need to be able toget some bug fixes into stats without a large beaurocracy, or with an expiration date on any code review. Josef
2009/2/25 <josef.pktd@gmail.com>:
A new workflow and code review might help, but if there is nobody, that adopts the orphaned subpackages, it will be just another place to store comments.
[...]
But for now, I think, I still need to be able toget some bug fixes into stats without a large beaurocracy, or with an expiration date on any code review.
[...] I don't see code review as a very formal process. Review by a domain expert would be ideal, but if such a person is not available a review by any other programmer would do. Here is the first patch I submitted for review: http://codereview.appspot.com/1105 It touches only 5 lines of code in numpy/lib/function_base.py, and functionally the patch was fine. Note how the patch progressed with the advice of Ondrej and Robert: spacing according to PEP8, and a better way to referring to the list-type. Once Robert pointed that out, I also fixed another occurrence in the file. If you see this measure of improvement on a 5 line patch, imagine the positive impact it can have on a more complicated piece of code. Regards Stéfan
josef.pktd@gmail.com wrote:
I think that having someone who feels responsible for the different parts of scipy is the main problem. And whatever we do to make this easier and that expands the number of active participants will be an improvement.
It's a man-power thing again in my mind. I would love to spend more time on SciPy, but have not found the time.
I'm a huge fan of full test coverage, but writing full verified tests is for me a lot of work and I still have a backlog of bugfixes because I haven't had time to write sufficient tests.
Also, I think that the commitment to maintain and increase test coverage should be sufficient for some cases. For example, in stats.mstats Pierre rewrote and added statistics functions for masked arrays, the test coverage is good, but there are still quite a few functions not covered and still some rough edges, but overall it looks in better condition than scipy.stats did. In this case I find it useful to have the full set of functions, that Pierre wrote, available immediately than adding them piecemeal as he finds time to write tests.
Yes. This is exactly the way I feel. I'd rather have functionality written by someone who cared about it perhaps without full test coverage than no functionality because someone can't find time to write tests.
But for now, I think, I still need to be able toget some bug fixes into stats without a large beaurocracy, or with an expiration date on any code review.
+1 BTW: Thanks for all your hard work on stats and optimize, Josef. -Travis
Anne Archibald wrote:
wrote some. I don't foresee this changing without some major change (e.g. a company suddenly hiring ten people to work full-time on scipy). So the question is how to make this model produce reliable code.
Suggestions people have made to accomplish this:
(1) Don't allow anything into SVN without tests and documentation. (2) Make sure everything gets reviewed before it goes in. (3) Appoint owners for parts of scipy.
Of these, I strongly approve of (1). It's really not a barrier.
As long as we don't do #2, then having the rule of #1 is completely fine. Say it in a similar way to that: "Don't commit to trunk until there are tests and documentation." I would be opposed to attempts to modify the nouns with fuzzy words like "complete" or "full" or something impossible to quantify. Here's an attempt at the wording: "Don't commit new code to trunk until you are sure the code works by passing unit-tests and being documented by a doc-string that follows the pattern established" Bug-fixes should (usually be accompanied by a unit test) unless they are "bug-guard changes" (i.e. like the one-liner I recently made to NumPy to catch the error-condition which I've never actually seen and don't know how to test for): I definitely think review should be encouraged but absolutely optional pre-commit. We should then encourage each other to review post-commit. As far as "ownership". How about we just have a posted list of "interested committers" that people can refer to in order to direct code-review requests and patch-submission. More than one person can be listed for each sub-project. -Travis
I don't think there are enough eyes at this point for a strict review policy. How many of the current packages have any maintainer? Who was maintaining the stats package before Josef got involved? How many folks besides Robert could look over the changes usefully? How many folks looked over Travis' recent addition to optimize? Who is working on the interpolation package?
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review. Yes, my feelings exactly. Quality goes up when people who have a
Charles R Harris wrote: personal stake or attachment to the code are engaged. How do we get more of this to happen? Formal review processes can actually have at least some negative impact in getting people engaged. Let's make a tweak here and a tweak there. Right now, I'm of the opinion that whatever makes the *workflow* of people like David, Pauli, Jarrod, Robert K, Robert C, Nathan, Matthew, Charles, Anne, Andrew, Gael, and Stefan (and others big contributors I may have missed) easier, I'm totally in favor of. If that is a DVCS and/or something different than Trac, then let's do that. It sounds like we are making steps in that direction which is excellent.
I don't have a problem with folks complaining about missing tests, etc., but I worry that if we put too many review steps into the submission path there won't be enough people to make it work.
This is exactly the way I feel.... I don't want to imply at all that we shouldn't be bugging each other about documentation and testing. I personally welcome any reminders in that direction. I am just worried about whether or not we are really solving the real problems that make it hard to contribute by instituting policy rather than providing examples of code to model. I do see a real need to fix the SVN-Trac workflow bottleneck as well as anything that helps the release process. It's actually at the release process where I would institute any formal review process. I'm also in favor of having a regular (i.e. every 3-6 months) release process. The difficulty there again is man-power. -Travis
On Thu, Feb 26, 2009 at 1:49 AM, Travis E. Oliphant <oliphant@enthought.com> wrote:
I do see a real need to fix the SVN-Trac workflow bottleneck as well as anything that helps the release process. It's actually at the release process where I would institute any formal review process. I'm also in favor of having a regular (i.e. every 3-6 months) release process. The difficulty there again is man-power.
One thing which may help here is to have a turn-around for the release manager: a different person every time. This person would have the last world of what goes in/what does not, with almost strictly enforced deadlines. In particular, we should really enforce code freeze - although I can understand the point that reviews may make things harder, I don't think it is possible at all to make good release without enforcing very strict timelines. There has to be no new code for some time before the release, time which is more than just one day or two. C/Fortran code would be the first to be freezed, then python, then docstring. The exact time can be tweaked after experiments, of course. But if we get this right, I believe that having freeze periods can make the time from patch to inclusion actually faster. Having a different person means it is not always the same person, obviously, and it may also keep people "honest", in the sense that a release manager will also be a coder later under a different release manager. cheers, David
David Cournapeau wrote:
On Thu, Feb 26, 2009 at 1:49 AM, Travis E. Oliphant <oliphant@enthought.com> wrote:
I do see a real need to fix the SVN-Trac workflow bottleneck as well as anything that helps the release process. It's actually at the release process where I would institute any formal review process. I'm also in favor of having a regular (i.e. every 3-6 months) release process. The difficulty there again is man-power.
One thing which may help here is to have a turn-around for the release manager: a different person every time. This person would have the last world of what goes in/what does not, with almost strictly enforced deadlines. In particular, we should really enforce code freeze - although I can understand the point that reviews may make things harder, I don't think it is possible at all to make good release without enforcing very strict timelines. There has to be no new code for some time before the release, time which is more than just one day or two. C/Fortran code would be the first to be freezed, then python, then docstring. The exact time can be tweaked after experiments, of course. But if we get this right, I believe that having freeze periods can make the time from patch to inclusion actually faster.
That sounds fine.
Having a different person means it is not always the same person, obviously, and it may also keep people "honest", in the sense that a release manager will also be a coder later under a different release manager.
This sounds great. I think the release manager gets to pick how strictly things are enforced. -Travis
Travis E. Oliphant wrote:
Charles R Harris wrote:
I don't think there are enough eyes at this point for a strict review policy. How many of the current packages have any maintainer? Who was maintaining the stats package before Josef got involved? How many folks besides Robert could look over the changes usefully? How many folks looked over Travis' recent addition to optimize? Who is working on the interpolation package?
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review.
Yes, my feelings exactly. Quality goes up when people who have a personal stake or attachment to the code are engaged. How do we get more of this to happen? Formal review processes can actually have at least some negative impact in getting people engaged. Let's make a tweak here and a tweak there. Right now, I'm of the opinion that whatever makes the *workflow* of people like David, Pauli, Jarrod, Robert K, Robert C, Nathan, Matthew, Charles, Anne, Andrew, Gael, and Stefan (and others big contributors I may have missed) easier, I'm totally in favor of. If that is a DVCS and/or something different than Trac, then let's do that.
It sounds like we are making steps in that direction which is excellent.
Really based on the discussion (including the latter comments), it appears to me that this discussion has moved towards what sort of developmental structure should scipy be using with a DVCS. I viewed much of the discussion following what sort of happens with Linux kernel development since they adopted DVCS starting with Bitkeeper. Jonathon Corbet has an interesting article on this http://ldn.linuxfoundation.org/book/how-participate-linux-community . Essentially there are sub-maintainer trees that feed into the testing tree (-mm), the staging tree (where patches are applied against that should minimize tree divergence) and hopefully Linus's tree. During that process is informal code review for at least bug fixes as new or major features still have a problem with code review. In some aspects, the man-power restriction with the Linux kernel development has been removed because code no longer has to flow through a single node. So this allows a user to get easily get code not only from these trees but also other developers. So scipy could do something similar where the use of DVCS which would hopefully this would reduce the burden on people like Robert and you. I do not see a real need at this time to say you 'own' that module and you must 'control' the development of it. I would suspect that 'ownership' of scipy components will naturally develop over time and, thus, should not be forced upon anyone. If sub-trees were created it would permit a sharing of incomplete code so that the burden of developing appropriate tests, writing documentation and testing can be distributed to interested parties. This would also foster mentoring and getting hands dirty in a positive way.
I don't have a problem with folks complaining about missing tests, etc., but I worry that if we put too many review steps into the submission path there won't be enough people to make it work.
This is exactly the way I feel.... I don't want to imply at all that we shouldn't be bugging each other about documentation and testing. I personally welcome any reminders in that direction. I am just worried about whether or not we are really solving the real problems that make it hard to contribute by instituting policy rather than providing examples of code to model.
From this it appears that in order to get code into scipy then you have to have all this documentation and tests. But really my concern is strict requirements of tests and documentation is that we will get minimal tests and inferior documentation or nothing at all. Rather I hope that the burden can be shifted from one person to a group of people then perhaps we can get more extensive tests and documentation as well as people actually testing the code on different systems with hopefully realistic situations. So at least there is some fix in some tree for a problem and eventually the rest of it will follow by the time everything is ready for mainline inclusion. Regards Bruce
On Wed, Feb 25, 2009 at 1:16 PM, Bruce Southey <bsouthey@gmail.com> wrote:
Travis E. Oliphant wrote:
Charles R Harris wrote:
I don't think there are enough eyes at this point for a strict review policy. How many of the current packages have any maintainer? Who was maintaining the stats package before Josef got involved? How many folks besides Robert could look over the changes usefully? How many folks looked over Travis' recent addition to optimize? Who is working on the interpolation package?
I think at this point we would be better off trying to recruit at least one person to "own" each package. For new packages that is usually the person who committed it but we also need ownership of older packages. Someone with a personal stake in a package is likely to do more for quality assurance at this point than any amount of required review.
Yes, my feelings exactly. Quality goes up when people who have a personal stake or attachment to the code are engaged. How do we get more of this to happen? Formal review processes can actually have at least some negative impact in getting people engaged. Let's make a tweak here and a tweak there. Right now, I'm of the opinion that whatever makes the *workflow* of people like David, Pauli, Jarrod, Robert K, Robert C, Nathan, Matthew, Charles, Anne, Andrew, Gael, and Stefan (and others big contributors I may have missed) easier, I'm totally in favor of. If that is a DVCS and/or something different than Trac, then let's do that.
It sounds like we are making steps in that direction which is excellent.
Really based on the discussion (including the latter comments), it appears to me that this discussion has moved towards what sort of developmental structure should scipy be using with a DVCS.
I viewed much of the discussion following what sort of happens with Linux kernel development since they adopted DVCS starting with Bitkeeper. Jonathon Corbet has an interesting article on this http://ldn.linuxfoundation.org/book/how-participate-linux-community . Essentially there are sub-maintainer trees that feed into the testing tree (-mm), the staging tree (where patches are applied against that should minimize tree divergence) and hopefully Linus's tree. During that process is informal code review for at least bug fixes as new or major features still have a problem with code review. In some aspects, the man-power restriction with the Linux kernel development has been removed because code no longer has to flow through a single node. So this allows a user to get easily get code not only from these trees but also other developers.
So scipy could do something similar where the use of DVCS which would hopefully this would reduce the burden on people like Robert and you.
I do not see a real need at this time to say you 'own' that module and you must 'control' the development of it. I would suspect that 'ownership' of scipy components will naturally develop over time and, thus, should not be forced upon anyone.
If sub-trees were created it would permit a sharing of incomplete code so that the burden of developing appropriate tests, writing documentation and testing can be distributed to interested parties. This would also foster mentoring and getting hands dirty in a positive way.
I don't have a problem with folks complaining about missing tests, etc., but I worry that if we put too many review steps into the submission path there won't be enough people to make it work.
This is exactly the way I feel.... I don't want to imply at all that we shouldn't be bugging each other about documentation and testing. I personally welcome any reminders in that direction. I am just worried about whether or not we are really solving the real problems that make it hard to contribute by instituting policy rather than providing examples of code to model.
From this it appears that in order to get code into scipy then you have to have all this documentation and tests. But really my concern is strict requirements of tests and documentation is that we will get minimal tests and inferior documentation or nothing at all. Rather I hope that the burden can be shifted from one person to a group of people then perhaps we can get more extensive tests and documentation as well as people actually testing the code on different systems with hopefully realistic situations. So at least there is some fix in some tree for a problem and eventually the rest of it will follow by the time everything is ready for mainline inclusion.
Regards Bruce
R has recently the discussion on quality control for statistical functions especially for use in the health industry, because they were critized by SAS that open source has insufficient guarantee for correctness. Scipy is not in the same group, but I think a review process before commit, if it attracts more users, will make it more likely to catch any problems. There are many good statistical tools in scipy, however until recently I wasn't sure what I would use in a "serious" application since there are too many, possibly incorrect results. The second case is that, more eyes might catch problems with refactoring, given that the test coverage is still shaky, and it might reduce the chance for dead code. Two examples for stats related functions: The recent removal of var and mean from scipy stats broke several functions that didn't have test coverage and so didn't show up in the tests. The second case is the recent addition of curvefit where the documentation didn't correspond to what was actually calculated. In both cases the review and corrections happened after the commit, since I keep an eye on any stats related commits. Without the review we might get misleading (or incorrect) numbers and broken code. And I've seen a lot of both in stats. But I also hope that any changes in the workflow helps in spreading the work of testing and documentation and makes adding new code easier and safer. Josef
Hi,
Scipy is not in the same group, but I think a review process before commit, if it attracts more users, will make it more likely to catch any problems. There are many good statistical tools in scipy, however until recently I wasn't sure what I would use in a "serious" application since there are too many, possibly incorrect results.
Actually, watching your (Josef's) fixes to stats was an eye-opener for me. My impression was as you've said, that it was badly enough broken that you wouldn't use it - before you went through it. We can't afford to carry on shipping code that is that broken, otherwise we'll lose momentum and users. And at least - for stats - you (Josef) came along and went deep into it. Doing that is much harder without documentation and tests, making it more likely that bad code with major bugs will carry on making Scipy limp in the world of numerical libraries like matlab and R. I think we do - clearly - have a problem, and that we do - clearly - need a change to fix it. Matthew
josef.pktd@gmail.com wrote:
for dead code. Two examples for stats related functions: The recent removal of var and mean from scipy stats broke several functions that didn't have test coverage and so didn't show up in the tests. The second case is the recent addition of curvefit where the documentation didn't correspond to what was actually calculated.
Yes, and this is a perfect example of "people who care" catch the problem even after commit and it gets fixed (extremely quickly) --- especially with this kind of documentation problem. I don't see how a formal review process would have helped in this case given that I probably would not have spent time on it had I had to jump through that hoop. And, the parameter estimation was working fine, but the error estimate was not (at least for a day).
In both cases the review and corrections happened after the commit, since I keep an eye on any stats related commits. Without the review we might get misleading (or incorrect) numbers and broken code. And I've seen a lot of both in stats.
This is exactly what we need. More people like Josef looking at check-ins and offering suggestions / assistance that corresponds to their area of interest / expertise. I was very grateful for the assistance (and the unit-tests based on NIST data). I don't think we need any formal rules except the natural ones. We definitely need better tools to "track which code has been reviewed", check builds on multiple platforms, merge development branches, etc., etc. We also need more cycles to spend on SciPy. -Travis
On Tue, Feb 24, 2009 at 1:59 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
I've split this off into a new thread because I felt there were two issues in Stefan's original thread.
This is in the hope that we can stimulate discussion on the workflow (as opposed to - say - which version control system to use, or which bugtracker).
I would be very interested to see if we can come to a consensus on the important discussion of whether to introduce fairly formal code review into the scipy workflow. I've appended the key piece of discussion below.
I'd summarize my position with the following points: - SciPy components should have one or more maintainers - "Maintainer" means anyone who has an interest in that particular component - Maintainers should be notified by the bugtracker when problems arise My hope is that by resolving more problems at lower levels we can partially relieve the burden on release managers and the like. I see the introduction of a new VCS/bugtracker as mainly for the benefit of these people, whose responsibilities require more scalability. So, I'm definitely not opposed to introducing these changes and experimenting with alternatives a bit. However, I think we need a distributed-responsibility system *as much if not more* than a DVCS or a new bug tracker. -- Nathan Bell wnbell@gmail.com http://graphics.cs.uiuc.edu/~wnbell/
On Tue, Feb 24, 2009 at 12:59, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
I've split this off into a new thread because I felt there were two issues in Stefan's original thread.
This is in the hope that we can stimulate discussion on the workflow (as opposed to - say - which version control system to use, or which bugtracker).
I would be very interested to see if we can come to a consensus on the important discussion of whether to introduce fairly formal code review into the scipy workflow. I've appended the key piece of discussion below.
My feeling about workflow is similar to my feeling about tools: the experimentalist in me is back in the corner with his hand raised high in the air. (He's very enthusiastic. "Ooh! Pickmepickmepickme!" There's one in every class. You know the kind.) There are a large number of unsupported assertions, gut feelings, and common sense flying about, and I don't want to get any of it on me. These are fairly poor guides for predicting the effects of project policy decisions, especially common sense. So let me make a meta-proposal: Let's do a series of one-month trials. We'll pick a workflow to try for a month. Much of the opposition to the various suggestions are coming from people who haven't tried to work in the proposed environment (at least not with this group of developers and this project; I conjecture without proof that variation between groups and projects is are large factors in the differing success of policies). The policy would be strictly enforced (to the extent that it involves enforcement) for that month. Detractors from the policy will grin and bear it for the duration of the month. Because it's just a month. We'll try their idea next month. This is far from scientific; the end result won't be measurable, per se. But it will give us experience with each of the suggestions. Perhaps what we fear about a policy won't be nearly as onerous as we think, or even has the reverse effect. Maybe we'll generate better ideas as we play around. Maybe some ideas truly suck. At the end, we may not generate a true consensus, but I suspect that we'll all be happier with *one* of the solutions than we are going to be if we just talk about it. And our happiness is really the thing to optimize, here, objective reality be damned. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
2009/2/25 Robert Kern <robert.kern@gmail.com>:
There are a large number of unsupported assertions, gut feelings, and common sense flying about, and I don't want to get any of it on me. These are fairly poor guides for predicting the effects of project policy decisions, especially common sense.
[...]
So let me make a meta-proposal: Let's do a series of one-month trials.
I'd be glad to try out different work-flows, and to pick the one that suits our project best. Would you like to propose a schedule? Just give me and David a week or so to sort out the issue tracker first! Regards Stéfan
Hi, Here's a concrete workflow problem: is there a way for me to test my changes on Mac and Windows before committing? I don't have access to either kind of machine, so I just write code I hope is portable, and David Cournapeau ends up having to suffer for it. Anne
On Tue, Feb 24, 2009 at 18:49, Anne Archibald <peridot.faceted@gmail.com> wrote:
Hi,
Here's a concrete workflow problem: is there a way for me to test my changes on Mac and Windows before committing? I don't have access to either kind of machine, so I just write code I hope is portable, and David Cournapeau ends up having to suffer for it.
You can make a branch, commit there, then force a build on the appropriate buildbot: http://buildbot.scipy.org/builders -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (25)
-
Andrew Straw
-
Anne Archibald
-
Brian Granger
-
Bruce Southey
-
Charles R Harris
-
David Cournapeau
-
David Cournapeau
-
Gael Varoquaux
-
jason-sage@creativetrax.com
-
Jonathan Guyer
-
josef.pktd@gmail.com
-
Matthew Brett
-
Michael Abshoff
-
Mike Hansen
-
Nathan Bell
-
Neil Martinsen-Burrell
-
Ondrej Certik
-
Pauli Virtanen
-
Rob Clewley
-
Robert Kern
-
Robin
-
Sturla Molden
-
Stéfan van der Walt
-
Travis E. Oliphant
-
Yaroslav Halchenko