pull requests and code review
Can we keep pull requests open for more than 3 hours, so we actually have time to look at them. looking at https://github.com/scipy/scipy/commit/c9c2d66701583984589c88c08c478e2fc6d9f4... my first guess is that the use of inspect.getargspec breaks when the hessian is a method attached to a class, as we have almost all our code in statsmodels. We just fixed a similar case in scipy. There should be at least time to check whether this kind of suspicions are justified or not. Josef
I don't think there should be any time limit on pull requests. Who is the *we* that needs time to look at them. I did take time to look at the changes. The code changes were not extensive (except for the very nice tests), and it is a welcome change. Your feedback on the use of inspect is very good. We can take a look at whether or not it method calls were considered and fix it if it does. If you are interested in continuing to review all the optimize changes, I will make sure and give you time to review in the future. This is where having a list of interested and available parties for different modules would make a great deal of sense. Thanks, -Travis On Jan 5, 2012, at 6:19 PM, josef.pktd@gmail.com wrote:
Can we keep pull requests open for more than 3 hours, so we actually have time to look at them.
looking at https://github.com/scipy/scipy/commit/c9c2d66701583984589c88c08c478e2fc6d9f4...
my first guess is that the use of inspect.getargspec breaks when the hessian is a method attached to a class, as we have almost all our code in statsmodels. We just fixed a similar case in scipy.
There should be at least time to check whether this kind of suspicions are justified or not.
Josef _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Thu, Jan 5, 2012 at 5:41 PM, Travis Oliphant <travis@continuum.io> wrote:
I don't think there should be any time limit on pull requests. Who is the *we* that needs time to look at them. I did take time to look at the changes. The code changes were not extensive (except for the very nice tests), and it is a welcome change.
Your feedback on the use of inspect is very good. We can take a look at whether or not it method calls were considered and fix it if it does.
If you are interested in continuing to review all the optimize changes, I will make sure and give you time to review in the future. This is where having a list of interested and available parties for different modules would make a great deal of sense.
The rule of thumb would be two days for two or three line fixups, a week or more for more intrusive stuff together with a note to the list. If you need the code *right now*, then keep developing off line until the whole package is ready. You'd be surprised at the feedback even "trivial" things can elicit. Chuck
Wow that is a much different time frame than I would expect or think necessary. Where does this rule of thumb come from? It has quite a few implications that I don't think are pleasant. I would be quite dissatisfied if my pull requests took 2-3 weeks to get merged. Is that just your feeling or does that come from empirical data or a larger vote? -Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Jan 5, 2012, at 6:53 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, Jan 5, 2012 at 5:41 PM, Travis Oliphant <travis@continuum.io> wrote: I don't think there should be any time limit on pull requests. Who is the *we* that needs time to look at them. I did take time to look at the changes. The code changes were not extensive (except for the very nice tests), and it is a welcome change.
Your feedback on the use of inspect is very good. We can take a look at whether or not it method calls were considered and fix it if it does.
If you are interested in continuing to review all the optimize changes, I will make sure and give you time to review in the future. This is where having a list of interested and available parties for different modules would make a great deal of sense.
The rule of thumb would be two days for two or three line fixups, a week or more for more intrusive stuff together with a note to the list. If you need the code *right now*, then keep developing off line until the whole package is ready. You'd be surprised at the feedback even "trivial" things can elicit.
Chuck
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Thu, Jan 5, 2012 at 6:02 PM, Travis Oliphant <travis@continuum.io> wrote:
Wow that is a much different time frame than I would expect or think necessary.
Where does this rule of thumb come from?
Experience. Think of it as a conference meeting with low bandwidth.
It has quite a few implications that I don't think are pleasant. I would be quite dissatisfied if my pull requests took 2-3 weeks to get merged.
Join the crowd.
Is that just your feeling or does that come from empirical data or a larger vote?
You just got a complaint, that should tell you something ;) You've been out of the loop for a couple of years, you need to take time to feel your way back into things. You may feel that you still own the property but a lot of squatters have moved in while you were gone... <snip> Chuck
Understood. I will listen to the feedback --- apologies to those whose toes feel steeped on by the crazy cousin stepping back into the house. The github process is one I am still not used to in practice. Also, obviously my orientation is towards much faster review cycles which have to happen in industry. I still think it would be useful to have someplace that people could register their interest in modules. A simple wiki page would be a start, I imagine. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Jan 5, 2012, at 7:27 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, Jan 5, 2012 at 6:02 PM, Travis Oliphant <travis@continuum.io> wrote: Wow that is a much different time frame than I would expect or think necessary.
Where does this rule of thumb come from?
Experience. Think of it as a conference meeting with low bandwidth.
It has quite a few implications that I don't think are pleasant. I would be quite dissatisfied if my pull requests took 2-3 weeks to get merged.
Join the crowd.
Is that just your feeling or does that come from empirical data or a larger vote?
You just got a complaint, that should tell you something ;)
You've been out of the loop for a couple of years, you need to take time to feel your way back into things. You may feel that you still own the property but a lot of squatters have moved in while you were gone...
<snip>
Chuck _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Thu, Jan 5, 2012 at 5:48 PM, Travis Oliphant <travis@continuum.io> wrote:
Understood. I will listen to the feedback --- apologies to those whose toes feel steeped on by the crazy cousin stepping back into the house.
The github process is one I am still not used to in practice. Also, obviously my orientation is towards much faster review cycles which have to happen in industry.
Just commenting from our perspective; IPython being the first project that switched to github and one that as of late has had a fast-and-furious rate of PR merging, I think we've found a flow that works reasonably well, without any hard-and-fast rules: - We try to review everything via pull requests, except truly trivial stuff, *small* documentation-only fixes, and emergency, 'my god we broke everything' fixes that may require a cleanup in a PR later but that just must go in to ensure master works at all. - For small PRs, a single person's review may be sufficient. Anything reasonably significant, we tend to let it sit after the first review for at least a day in case someone else has an opinion. It's also the case that it's rare that a large PR doesn't require any fixes. We do try to read every line of code, and it's rare to see hundreds of lines of code where every one is perfect out of the gate. So it very naturally tends to happen that larger PRs have a longer life, but that's because we really are quite picky: review means check every line, run the test suite with that branch merged, etc; not just look a the overall idea and trust the details are OK. - We make liberal use of the @username feature to directly ping anyone we suspect may care or may have useful idea/opinion. - For really large stuff, we tend to wait a little longer, and often will ping explicitly the whole list saying 'hey, PR # xyz is big and complicated and has feature foo that's a bit controversial, please come over and provide some feedback'. I hope this helps the crazy cousin ease himself back into civilized society ;) We're certainly glad to have him!!! Best, f
On Thu, Jan 5, 2012 at 8:48 PM, Travis Oliphant <travis@continuum.io> wrote:
Understood. I will listen to the feedback --- apologies to those whose toes feel steeped on by the crazy cousin stepping back into the house.
The github process is one I am still not used to in practice. Also, obviously my orientation is towards much faster review cycles which have to happen in industry.
I still think it would be useful to have someplace that people could register their interest in modules. A simple wiki page would be a start, I imagine.
Getting a rough overview of interested people would be useful, but most of the time it is just the usual suspects for scipy, given the comment history on github. On a module level it might not be so informative. For example if you change signal.lfilter then I would check immediately whether it affects statsmodels, but for most other changes I wouldn't be interested enough to check the details. For the new optimize.minimize I stayed out of the discussion of the details, and kept only track of the main design since it doesn't have a direct impact on statsmodels but will be useful in future. Josef
Travis
-- Travis Oliphant (on a mobile) 512-826-7480
On Jan 5, 2012, at 7:27 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, Jan 5, 2012 at 6:02 PM, Travis Oliphant <travis@continuum.io> wrote:
Wow that is a much different time frame than I would expect or think necessary.
Where does this rule of thumb come from?
Experience. Think of it as a conference meeting with low bandwidth.
It has quite a few implications that I don't think are pleasant. I would be quite dissatisfied if my pull requests took 2-3 weeks to get merged.
Join the crowd.
Is that just your feeling or does that come from empirical data or a larger vote?
You just got a complaint, that should tell you something ;)
You've been out of the loop for a couple of years, you need to take time to feel your way back into things. You may feel that you still own the property but a lot of squatters have moved in while you were gone...
<snip>
Chuck
_______________________________________________
SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Thu, Jan 5, 2012 at 8:48 PM, Travis Oliphant <travis@continuum.io> wrote:
Understood. I will listen to the feedback --- apologies to those whose toes feel steeped on by the crazy cousin stepping back into the house.
The github process is one I am still not used to in practice. Also, obviously my orientation is towards much faster review cycles which have to happen in industry.
I still think it would be useful to have someplace that people could register their interest in modules. A simple wiki page would be a start, I imagine.
Getting a rough overview of interested people would be useful, but most of the time it is just the usual suspects for scipy, given the comment history on github.
On a module level it might not be so informative. For example if you change signal.lfilter then I would check immediately whether it affects statsmodels, but for most other changes I wouldn't be interested enough to check the details. For the new optimize.minimize I stayed out of the discussion of the details, and kept only track of the main design since it doesn't have a direct impact on statsmodels but will be useful in future.
Josef
Travis
-- Travis Oliphant (on a mobile) 512-826-7480
On Jan 5, 2012, at 7:27 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, Jan 5, 2012 at 6:02 PM, Travis Oliphant <travis@continuum.io> wrote:
Wow that is a much different time frame than I would expect or think necessary.
Where does this rule of thumb come from?
Experience. Think of it as a conference meeting with low bandwidth.
It has quite a few implications that I don't think are pleasant. I would be quite dissatisfied if my pull requests took 2-3 weeks to get merged.
Join the crowd.
Is that just your feeling or does that come from empirical data or a larger vote?
You just got a complaint, that should tell you something ;)
You've been out of the loop for a couple of years, you need to take time to feel your way back into things. You may feel that you still own the property but a lot of squatters have moved in while you were gone...
<snip>
Chuck
_______________________________________________
SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev I do appreciate the work that various people put into this project but
On Thu, Jan 5, 2012 at 8:02 PM, <josef.pktd@gmail.com> wrote: this sort of illustrates the frustration that some of us mere users have with numpy and scipy. One hand you appear to want to change the numpy governance to being more open and community driven yet this thread does reflects the opposite. Sure I think that bugs should be fixed such as the way Fernando described. But adding or removing features need a more community-based approach than a single developer. Given the silly bugs due not testing supported Python versions and platforms, which seem to regularly occur during release time, there is strong need for community involvement. One of the advantages of this whole distributed approach is that different trees can be merged at specific times like release dates (somewhat like the Linux kernel does). At least this would find and force the fixing of those silly bugs otherwise that code would not go in at that time. The other advantage is that people could be encouraged to do code review and perhaps become contributors when they know that work is appreciated - Welcome abroad Denis! Bruce
I do appreciate the work that various people put into this project but this sort of illustrates the frustration that some of us mere users have with numpy and scipy. One hand you appear to want to change the numpy governance to being more open and community driven yet this thread does reflects the opposite.
Actually, I feel like it already is pretty open and community driven *as witnessed* by this thread. I may have a particular opinion, but it is the opinion of the active community that holds sway and drives what happens. What I would like to do is underscore that point more broadly. But, I also want to make sure that the current crop of maintainers doesn't make it difficult for new people to come in because they have a different mindset or are too overwhelmed by how much one has to learn or understand to make a contribution. Also, it really depends on what section of code we are talking about as to how much input is needed. The more people are depending on code, the more review is needed. That is clear, but it is difficult to tell how much code is actually being used. There is a lot of SciPy that still needs to be developed more fully, and if it takes an action-by-committee to make it happen, SciPy will not reach 1.0 very quickly. Here, for example, I wonder how many active users of fmin_ncg are actually out there. Most optimization approaches don't use the hessian parameters at all. I rarely recommend using fmin_ncg to others. That was one of the variables in my mental model when I rapidly reviewed the original contribution. -Travis
On Fri, Jan 6, 2012 at 10:04 PM, Travis Oliphant <travis@continuum.io> wrote:
I do appreciate the work that various people put into this project but this sort of illustrates the frustration that some of us mere users have with numpy and scipy. One hand you appear to want to change the numpy governance to being more open and community driven yet this thread does reflects the opposite.
Actually, I feel like it already is pretty open and community driven *as witnessed* by this thread. I may have a particular opinion, but it is the opinion of the active community that holds sway and drives what happens. What I would like to do is underscore that point more broadly. But, I also want to make sure that the current crop of maintainers doesn't make it difficult for new people to come in because they have a different mindset or are too overwhelmed by how much one has to learn or understand to make a contribution.
I think that's where the move to github has made it much easier. It's much easier to make comments to a specific line or general comments when reviewing a pull request. Sometimes it's relatively easy, just make sure test coverage is high and everything works as advertised. Sometimes it's pointing to the existing pattern, eg. for cython code or how to wrap lapack. Other times it's difficult to make changes that are compatible with existing usage. The recent example are the improvements to gaussian_kde. Ralf made improvements that would be good if it were in a new package, but it was a lengthy process to get it into a form that doesn't break what I considered to be the existing standard usage (recommended on the mailinglists and on stackoverflow). It can be a little bit painful and time consuming at times but we finally managed what looks like a good enhancement. Other times pull requests or suggestions just never get to the stage where they are ready to be merged, either because the proposer doesn't bring into into a mergable form or no "maintainer" is interested enough to do the work himself, or it doesn't make enough sense. My impression is that in most cases the pull request and code review system on github works pretty well for this. Welcoming new users with a different mindset doesn't mean that code that is not sufficiently tested or that can be expected to make maintenance much more difficult in the future has to be merged into scipy. (I'm just a reviewer, not a committer anymore, and I'm very glad about the role that especially Ralf and Warren take in getting improvements into the actual code.) Josef
Also, it really depends on what section of code we are talking about as to how much input is needed. The more people are depending on code, the more review is needed. That is clear, but it is difficult to tell how much code is actually being used. There is a lot of SciPy that still needs to be developed more fully, and if it takes an action-by-committee to make it happen, SciPy will not reach 1.0 very quickly.
Here, for example, I wonder how many active users of fmin_ncg are actually out there. Most optimization approaches don't use the hessian parameters at all. I rarely recommend using fmin_ncg to others. That was one of the variables in my mental model when I rapidly reviewed the original contribution.
Overall I find it difficult to tell which code is in use (for example for statsmodels). For scipy it's a bit easier to tell from what the big downstream packages use, and complain when something breaks and also when the propose improvements, or just from keeping track of the discussion. As for the optimizers, Skipper spend some effort on wrapping all the unconstraint solvers and now also many constraint solvers so that they can be used by the estimators with a simple method="ncg" or something like this. Skipper also spend a lot of time coding gradients and hessians.(Performance with analytical hessians is much better, but in some cases we just use a numerical hessian.) I think all wrapped optimizers are covered in the test suite. Which once are commonly used is not so clear, for discrete models the default is actually our home (Skipper) made Newton method.
-Travis
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Thu, Jan 5, 2012 at 6:48 PM, Travis Oliphant <travis@continuum.io> wrote:
Understood. I will listen to the feedback --- apologies to those whose toes feel steeped on by the crazy cousin stepping back into the house.
The github process is one I am still not used to in practice. Also, obviously my orientation is towards much faster review cycles which have to happen in industry.
I still think it would be useful to have someplace that people could register their interest in modules. A simple wiki page would be a start, I imagine.
Welcome back, BTW. Chuck
On Thu, Jan 5, 2012 at 7:41 PM, Travis Oliphant <travis@continuum.io> wrote:
I don't think there should be any time limit on pull requests. Who is the *we* that needs time to look at them. I did take time to look at the changes. The code changes were not extensive (except for the very nice tests), and it is a welcome change.
We is whoever is interested and might be affected by the changes. I'm looking at all stats pull request and most or all pull requests that will have a direct impact on statsmodels. It takes some time until I see the notification that a pull request has been opened and some time to see whether it might be a change that affects statsmodels. As soon as a pull request is closed github doesn't send out any notifications anymore. So the discussion is closed, except for developers that are already participating in the pull request. At least that is my impression of github from the past experience.
Your feedback on the use of inspect is very good. We can take a look at whether or not it method calls were considered and fix it if it does.
relying on inspect is very fragile, and it depends a lot on the details of the implementation, so I'm always sceptical when it's used. In this case it compares the hessian signature with the signature of the main function, if they agree then everything is fine. But I'm not sure it really works in all our (statsmodels) use cases.
If you are interested in continuing to review all the optimize changes, I will make sure and give you time to review in the future. This is where having a list of interested and available parties for different modules would make a great deal of sense.
Since statsmodesl is a very heavy user of scipy.optimize, I'm keeping almost complete track of any changes. Josef
Thanks,
-Travis
On Jan 5, 2012, at 6:19 PM, josef.pktd@gmail.com wrote:
Can we keep pull requests open for more than 3 hours, so we actually have time to look at them.
looking at https://github.com/scipy/scipy/commit/c9c2d66701583984589c88c08c478e2fc6d9f4...
my first guess is that the use of inspect.getargspec breaks when the hessian is a method attached to a class, as we have almost all our code in statsmodels. We just fixed a similar case in scipy.
There should be at least time to check whether this kind of suspicions are justified or not.
Josef _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
Fair enough. API users willing to review are golden. I am happy to suppress my impatience to give users time to see changes. It makes me wonder if one could set up a system that allows downstream users to voluntarily register to receive notifications on changes to functions or classes that affect them. Functions that are heavily used would have more time for people to review the changes. -- Travis Oliphant (on a mobile) 512-826-7480 On Jan 5, 2012, at 6:55 PM, josef.pktd@gmail.com wrote:
On Thu, Jan 5, 2012 at 7:41 PM, Travis Oliphant <travis@continuum.io> wrote:
I don't think there should be any time limit on pull requests. Who is the *we* that needs time to look at them. I did take time to look at the changes. The code changes were not extensive (except for the very nice tests), and it is a welcome change.
We is whoever is interested and might be affected by the changes. I'm looking at all stats pull request and most or all pull requests that will have a direct impact on statsmodels.
It takes some time until I see the notification that a pull request has been opened and some time to see whether it might be a change that affects statsmodels.
As soon as a pull request is closed github doesn't send out any notifications anymore. So the discussion is closed, except for developers that are already participating in the pull request. At least that is my impression of github from the past experience.
Your feedback on the use of inspect is very good. We can take a look at whether or not it method calls were considered and fix it if it does.
relying on inspect is very fragile, and it depends a lot on the details of the implementation, so I'm always sceptical when it's used. In this case it compares the hessian signature with the signature of the main function, if they agree then everything is fine. But I'm not sure it really works in all our (statsmodels) use cases.
If you are interested in continuing to review all the optimize changes, I will make sure and give you time to review in the future. This is where having a list of interested and available parties for different modules would make a great deal of sense.
Since statsmodesl is a very heavy user of scipy.optimize, I'm keeping almost complete track of any changes.
Josef
Thanks,
-Travis
On Jan 5, 2012, at 6:19 PM, josef.pktd@gmail.com wrote:
Can we keep pull requests open for more than 3 hours, so we actually have time to look at them.
looking at https://github.com/scipy/scipy/commit/c9c2d66701583984589c88c08c478e2fc6d9f4...
my first guess is that the use of inspect.getargspec breaks when the hessian is a method attached to a class, as we have almost all our code in statsmodels. We just fixed a similar case in scipy.
There should be at least time to check whether this kind of suspicions are justified or not.
Josef _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Thu, Jan 5, 2012 at 10:04 PM, Denis Laxalde <denis.laxalde@mcgill.ca> wrote:
josef.pktd@gmail.com wrote:
statsmodesl is a very heavy user of scipy.optimize
In this respect, is there a set of examples or a particular test suite that I could use for further validation?
If you mean which parts of statsmodels can be used to make sure any changes in scipy.optimize look okay from our perspective. discrete makes particularly heavy use of optimize. tsa does as well but to a lesser extent as far as test coverage of different solvers. Our censored regression models will have good coverage of the solvers, but it's not in upstream master yet (though should be soon). Skipper
On Thu, Jan 5, 2012 at 10:14 PM, Skipper Seabold <jsseabold@gmail.com> wrote:
On Thu, Jan 5, 2012 at 10:04 PM, Denis Laxalde <denis.laxalde@mcgill.ca> wrote:
josef.pktd@gmail.com wrote:
statsmodesl is a very heavy user of scipy.optimize
In this respect, is there a set of examples or a particular test suite that I could use for further validation?
If you mean which parts of statsmodels can be used to make sure any changes in scipy.optimize look okay from our perspective. discrete makes particularly heavy use of optimize. tsa does as well but to a lesser extent as far as test coverage of different solvers. Our censored regression models will have good coverage of the solvers, but it's not in upstream master yet (though should be soon).
Since at least Skipper is usually reasonably up to date with scipy master, there is still the check later on, before a scipy release, whether any changes in scipy break tested code in statsmodels. I have to rely on reading code, since I'm currently not able to compile scipy. Josef
Skipper _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
Skipper Seabold wrote:
statsmodesl is a very heavy user of scipy.optimize
In this respect, is there a set of examples or a particular test suite that I could use for further validation?
If you mean which parts of statsmodels can be used to make sure any changes in scipy.optimize look okay from our perspective. discrete makes particularly heavy use of optimize. tsa does as well but to a lesser extent as far as test coverage of different solvers. Our censored regression models will have good coverage of the solvers, but it's not in upstream master yet (though should be soon).
Thanks, I'll include these in my regular tests. -- Denis
participants (7)
-
Bruce Southey
-
Charles R Harris
-
Denis Laxalde
-
Fernando Perez
-
josef.pktd@gmail.com
-
Skipper Seabold
-
Travis Oliphant