Re: [Pandas-dev] Thoughts on adopting a 1-PR-1-commit policy?
hey Jan, I'm adding the mailing list back. Several comments inline On Sun, Jan 17, 2016 at 1:09 AM, Jan Schulz <jasc@gmx.net> wrote:
Hi,
Just a different opinion: I like having commits do one logical thing and not squash multiple "logical complete"things together (this means that commit is a logical step, not the PR and that the commits should be clean and not contain "fixup", "typo" style commits). During the categorical work, I found that a few times I regretted that I couldn't go back to look up the specific change in a commit and look up what and why that commit was done because it was all mixed up with the rest of the squashed commits in that PR.
I'm not suggesting that you should have to squash your commits inside the PR. This only concerns how *patches are applied to pandas's master branch". Ideally, no squashing occurs inside the developer branch (so the "story", so to speak, about the patch is preserved), but what the Apache patch tool does is - Turns a multi-commit PR into a single-commit patch - Puts the individual commit hashes in the commit message; so you can always visit the original commits - Puts the description from the PR into the commit message - Cherry-picks instead of merging, so you can observe evolution of pandas/master in a clear and linear way
My feeling was always that squashes are performed because the rebases are so hard because of multiple PRs fixing stuff in the same files at the same time. If this refactorings come through (both better separation of backend-frontend specific code and the suggested split up of frame.py, etc), I think this is not so much a problem anymore.
I'm not sure where the commit history with merges is a problem: during balme (in github, never used git itself), I don't see any merges?!
Here is something that would be very hard: create release notes given the commit history.
So my suggestion would go in a different direction: better commits in the PRs with proper commit messages (not only the headline but also explanations in the message.
Jan -- Jan Schulz mail: jasc@gmx.net web: http://www.katzien.de
On 17 January 2016 at 07:29, Wes McKinney <wesmckinn@gmail.com> wrote:
The script performs the squashing automatically, so users can squash manually if they wish or let us do it.
On Sat, Jan 16, 2016 at 8:11 PM, Jeff Reback <jeffreback@gmail.com> wrote:
seems find to use the cherry picking script
though I don't think should relax users from squashing
On Jan 16, 2016, at 6:46 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
Copying the mailing list. Indeed makes rebasing unnecessary if there are no cherry-pick conflicts.
On Sat, Jan 16, 2016 at 3:34 PM, Tom Augspurger <tom.augspurger88@gmail.com> wrote:
Rebasing can be tough for new contributors, so for that alone I'd say let's try it.
-Tom
On Jan 16, 2016, at 5:20 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
I've grown very fond of the PR cherry-picking style used in many Apache projects.
Here's an example of a very large commit to Apache Spark that was performed in this fashion:
https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3...
If you compare pandas's commit history with a project like this, you'll see it is much easier to follow because there is one commit for each patch to the project, rather than a merge commit plus 1 or more merged commits (depending on whether the person merging the PR did an interactive rebase).
The script to do this is not too complex, and is even less complex for pandas because we do not use JIRA:
https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
I've been using a pared down version of the script in Ibis:
https://github.com/cloudera/ibis/blob/master/dev/merge-pr.py
Here is an example of what a merge commit with multiple subcommits looks like using this tool:
https://github.com/cloudera/ibis/commit/eafabe060dcaaea0a6076342eaa374929b91...
It's pretty easy to use: run the script and enter the PR # you are merging. It automatically squashes and closes the merged PR.
Let me know if this is something that would interest the team. I know there are varying opinions on the GitHub Green Button =)
- Wes _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
ok this is now implemented, ./merge-pr.py will merge things via cherry-picking. Though still think that most users should have a clean PR going in. This will be more useful for bigger patches where you do want to preserve the history. On Sun, Jan 17, 2016 at 8:34 AM, Wes McKinney <wesmckinn@gmail.com> wrote:
hey Jan,
I'm adding the mailing list back. Several comments inline
On Sun, Jan 17, 2016 at 1:09 AM, Jan Schulz <jasc@gmx.net> wrote:
Hi,
Just a different opinion: I like having commits do one logical thing and not squash multiple "logical complete"things together (this means that commit is a logical step, not the PR and that the commits should be clean and not contain "fixup", "typo" style commits). During the categorical work, I found that a few times I regretted that I couldn't go back to look up the specific change in a commit and look up what and why that commit was done because it was all mixed up with the rest of the squashed commits in that PR.
I'm not suggesting that you should have to squash your commits inside the PR. This only concerns how *patches are applied to pandas's master branch". Ideally, no squashing occurs inside the developer branch (so the "story", so to speak, about the patch is preserved), but what the Apache patch tool does is
- Turns a multi-commit PR into a single-commit patch - Puts the individual commit hashes in the commit message; so you can always visit the original commits - Puts the description from the PR into the commit message - Cherry-picks instead of merging, so you can observe evolution of pandas/master in a clear and linear way
My feeling was always that squashes are performed because the rebases are so hard because of multiple PRs fixing stuff in the same files at the same time. If this refactorings come through (both better separation of backend-frontend specific code and the suggested split up of frame.py, etc), I think this is not so much a problem anymore.
I'm not sure where the commit history with merges is a problem: during balme (in github, never used git itself), I don't see any merges?!
Here is something that would be very hard: create release notes given the commit history.
So my suggestion would go in a different direction: better commits in the PRs with proper commit messages (not only the headline but also explanations in the message.
Jan -- Jan Schulz mail: jasc@gmx.net web: http://www.katzien.de
On 17 January 2016 at 07:29, Wes McKinney <wesmckinn@gmail.com> wrote:
The script performs the squashing automatically, so users can squash manually if they wish or let us do it.
On Sat, Jan 16, 2016 at 8:11 PM, Jeff Reback <jeffreback@gmail.com> wrote:
seems find to use the cherry picking script
though I don't think should relax users from squashing
On Jan 16, 2016, at 6:46 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
Copying the mailing list. Indeed makes rebasing unnecessary if there are no cherry-pick conflicts.
On Sat, Jan 16, 2016 at 3:34 PM, Tom Augspurger <tom.augspurger88@gmail.com> wrote:
Rebasing can be tough for new contributors, so for that alone I'd say let's try it.
-Tom
> On Jan 16, 2016, at 5:20 PM, Wes McKinney <wesmckinn@gmail.com> wrote: > > I've grown very fond of the PR cherry-picking style used in many > Apache projects. > > Here's an example of a very large commit to Apache Spark that was > performed in this fashion: > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3... > > If you compare pandas's commit history with a project like this, > you'll see it is much easier to follow because there is one commit for > each patch to the project, rather than a merge commit plus 1 or more > merged commits (depending on whether the person merging the PR did an > interactive rebase). > > The script to do this is not too complex, and is even less complex for > pandas because we do not use JIRA: > > https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py > > I've been using a pared down version of the script in Ibis: > > https://github.com/cloudera/ibis/blob/master/dev/merge-pr.py > > Here is an example of what a merge commit with multiple subcommits > looks like using this tool: > > https://github.com/cloudera/ibis/commit/eafabe060dcaaea0a6076342eaa374929b91... > > It's pretty easy to use: run the script and enter the PR # you are > merging. It automatically squashes and closes the merged PR. > > Let me know if this is something that would interest the team. I know > there are varying opinions on the GitHub Green Button =) > > - Wes > _______________________________________________ > Pandas-dev mailing list > Pandas-dev@python.org > https://mail.python.org/mailman/listinfo/pandas-dev
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Hi, On 17 January 2016 at 14:34, Wes McKinney <wesmckinn@gmail.com> wrote:
I'm adding the mailing list back. Several comments inline
Oups, sorry!
I'm not suggesting that you should have to squash your commits inside the PR. This only concerns how *patches are applied to pandas's master branch". Ideally, no squashing occurs inside the developer branch (so the "story", so to speak, about the patch is preserved), but what the Apache patch tool does is
I would still argue against this: the master branch is what is used in blame and figuring out why something was done in that way is much harder if you always have to get back to some commits in obscure branches which might even be removed from the repo. IMO, all this squashing is an incentive not to write good commit messages as these are in the end more or less discarded as they are all mixed up :-( At least that what happened with me: I tried to write "unit of change" commits (`rebase -i` all "typo" and "fixup" commits + good commits messages) but then these got squashed and I stopped writing such messages because it felt that this was simply wasted and not appreciated. The result was that some decisions which I took are not explained in the commits and were lost when the topic were revisited half a year latter.
Here is something that would be very hard: create release notes given the commit history.
I don't think this gets any easier as long as some manual things are done. There will still be simple commits which correct a typo and which should not show up in the release notes. Whatever technical solution is used for this, some discipline has to be taken to make that successfull. If release notes generation is what is this all about, then there could be several solutions: * only generate release notes from merge commits (remove headline, only use message) * only generate release notes from commits which include a tag * only generate release notes from merge commits which include a tag Jan
On Sun, Jan 17, 2016 at 9:04 AM, Jan Schulz <jasc@gmx.net> wrote:
Hi,
On 17 January 2016 at 14:34, Wes McKinney <wesmckinn@gmail.com> wrote:
I'm adding the mailing list back. Several comments inline
Oups, sorry!
I'm not suggesting that you should have to squash your commits inside the PR. This only concerns how *patches are applied to pandas's master branch". Ideally, no squashing occurs inside the developer branch (so the "story", so to speak, about the patch is preserved), but what the Apache patch tool does is
I would still argue against this: the master branch is what is used in blame and figuring out why something was done in that way is much harder if you always have to get back to some commits in obscure branches which might even be removed from the repo.
These issues should be addressed during the code review process. It is worse, in my opinion, to have a mix of intermediate (possibly broken) and verified commits in master as opposed to atomic, verified commits. Additionally, lack of "atomicity" with patches has more issues: - Difficult to revert patches - Difficult to port patches into maintenance branches Requiring patches to be atomic is common practice in large software teams because otherwise codebase maintenance is a nightmare with > 5-10 developers working in parallel.
IMO, all this squashing is an incentive not to write good commit messages as these are in the end more or less discarded as they are all mixed up :-(
Squash or no squash, the only way to have good commit messages is to expect a certain level of professionalism from pandas contributors. If the commit / PR description is inadequate, this is the responsibility of the code reviewer to address with the developer proposing the patch.
At least that what happened with me: I tried to write "unit of change" commits (`rebase -i` all "typo" and "fixup" commits + good commits messages) but then these got squashed and I stopped writing such messages because it felt that this was simply wasted and not appreciated.
The result was that some decisions which I took are not explained in the commits and were lost when the topic were revisited half a year latter.
Here is something that would be very hard: create release notes given the commit history.
I don't think this gets any easier as long as some manual things are done. There will still be simple commits which correct a typo and which should not show up in the release notes. Whatever technical solution is used for this, some discipline has to be taken to make that successfull.
If release notes generation is what is this all about, then there could be several solutions:
* only generate release notes from merge commits (remove headline, only use message) * only generate release notes from commits which include a tag * only generate release notes from merge commits which include a tag
No, the release notes aren't a major factor, but one of many issues caused by a non-atomic, non-linear commit history.
Jan _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
On Jan 17, 2016 10:37, "Wes McKinney" <wesmckinn@gmail.com> wrote:
[...]
Requiring patches to be atomic is common practice in large software teams because otherwise codebase maintenance is a nightmare with > 5-10 developers working in parallel.
A few thoughts: For what it's worth, possibly the largest/most parallel software collaboration in the world is the Linux kernel, and they mandate that complex patches must *not* be squashed, but must instead be broken up into a series of self-contained incremental patches (as Jan is advocating). BTW I think you'll find that if you consistently merge using --no-ff (which is what the green button does), then "git log --first-parent" will give you *exactly* the same linear squashed history that you are hoping for, as in the diffs will be byte-for-byte identical. This approach keeps all the history in the repository, and discards the distracting parts at access time rather than commit time. In numpy we mandate that all PRs go via the green button for this reason. I suspect that the projects you're thinking of do what they do because of a combination of (a) not being very large in the grand scheme of things so that the linearization itself doesn't become a bottleneck the way it would for a project like the kernel, (b) not understanding git terribly well, (c) having to assume an even lower level of git knowledge in individual contributors. (Versus the kernel, where they have the "luxury" of imposing arbitrarily high standards and then abusing anyone who doesn't meet them until they figure it out or quit.) Note that it isn't a great idea to assume that the individual commits that you squashed will still be findable later, even given their id. It's actually a good idea, generally, to delete no longer relevant branches from a personal fork, to avoid getting lost among hundreds of similarly named branches. I actually need to get more in the habit of doing this :-). And even if one disagrees about it being a good idea, people do it and there's no way to stop them. But when this happens, if the commit was cherrypicked into the main repo and the branch is gone from the personal fork, then github will eventually garbage collect the original commits, and trying to look up those commit hashes, maybe years later, will give you a 404. Anyway, both processes can obviously work, and what works for you is what works for you. I'm not an absolutist :-). But I thought it might be helpful to at least be aware of some of these points while making the decision. Cheers, -n
Hi, On 17 January 2016 at 19:37, Wes McKinney <wesmckinn@gmail.com> wrote:
These issues should be addressed during the code review process. It is worse, in my opinion, to have a mix of intermediate (possibly broken) and verified commits in master as opposed to atomic, verified commits.
Additionally, lack of "atomicity" with patches has more issues:
I agree, but there is also `rebase -i` to make these commits be atomic... E.g. https://github.com/pydata/pandas/pull/11582 The 4 commits work on their own and AFAIR each commit tested green. So: IMO important changes (especially everything which changes behaviour) should get one commit (or PR/Cherry pick as per your proposal) per behaviour change and this behaviour change should be explained in the commit message. If the above PR would have been squashed, then 4 messages would be appended to each other and one couldn't separate which description would belong to which... Jan
Yeah, the point of this issue is not "there shall be no more than 1 commit per PR" but rather that smaller patches (i.e., most patches) should not degrade the signal-to-noise ratio of our commit history. Further, we should avoid merging commits that don't stand on their own. Lastly, merge commits generally only serve to degrade the SnR ratio. Let's look at a sample of yesterday's commits: https://www.dropbox.com/s/mp5yfp76h6h8z3y/commit-log-20160116.png?dl=0 No mistakes were made here, except that our current process (which Jeff has been following diligently) is resulting in a commit history that is less useful than it could be. My preference is: - Use the patch tool for smaller patches and large patches that haven't been split out into a series of incremental, standalone patches - For large patches that make sense as multiple incremental commits, none of which breaks the build, merge with --ff-only (rebasing as needed). I expect this to be rare. I really like to avoid "edge-case driven development" -- we are bound to have patches where this guidance doesn't feel right, and we definitely don't have to dogmatically follow it. - Wes On Sun, Jan 17, 2016 at 12:53 PM, Jan Schulz <jasc@gmx.net> wrote:
Hi,
On 17 January 2016 at 19:37, Wes McKinney <wesmckinn@gmail.com> wrote:
These issues should be addressed during the code review process. It is worse, in my opinion, to have a mix of intermediate (possibly broken) and verified commits in master as opposed to atomic, verified commits.
Additionally, lack of "atomicity" with patches has more issues:
I agree, but there is also `rebase -i` to make these commits be atomic...
E.g. https://github.com/pydata/pandas/pull/11582
The 4 commits work on their own and AFAIR each commit tested green.
So: IMO important changes (especially everything which changes behaviour) should get one commit (or PR/Cherry pick as per your proposal) per behaviour change and this behaviour change should be explained in the commit message.
If the above PR would have been squashed, then 4 messages would be appended to each other and one couldn't separate which description would belong to which...
Jan
On Sun, Jan 17, 2016 at 3:34 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
Yeah, the point of this issue is not "there shall be no more than 1 commit per PR" but rather that smaller patches (i.e., most patches) should not degrade the signal-to-noise ratio of our commit history. Further, we should avoid merging commits that don't stand on their own. Lastly, merge commits generally only serve to degrade the SnR ratio.
Let's look at a sample of yesterday's commits:
https://www.dropbox.com/s/mp5yfp76h6h8z3y/commit-log-20160116.png?dl=0
No mistakes were made here, except that our current process (which Jeff has been following diligently) is resulting in a commit history that is less useful than it could be.
My preference is:
- Use the patch tool for smaller patches and large patches that haven't been split out into a series of incremental, standalone patches - For large patches that make sense as multiple incremental commits, none of which breaks the build, merge with --ff-only (rebasing as needed). I expect this to be rare.
I really like to avoid "edge-case driven development" -- we are bound to have patches where this guidance doesn't feel right, and we definitely don't have to dogmatically follow it.
For the record -- I spent some time reviewing the major category dtype pull requests that were merged in 2014, and given the sprawling nature of those changes and the huge amount of collaboration that took place, I agree it would have been preferable to fast-forward merge the incremental commits instead of squashing them into a couple of monolithic commits. https://github.com/pydata/pandas/commit/0f62d3fc62f317538044ed3d349bfb89fb7e... https://github.com/pydata/pandas/commit/ea0a13c172761348d08285a19ebf731cdabb...
- Wes
On Sun, Jan 17, 2016 at 12:53 PM, Jan Schulz <jasc@gmx.net> wrote:
Hi,
On 17 January 2016 at 19:37, Wes McKinney <wesmckinn@gmail.com> wrote:
These issues should be addressed during the code review process. It is worse, in my opinion, to have a mix of intermediate (possibly broken) and verified commits in master as opposed to atomic, verified commits.
Additionally, lack of "atomicity" with patches has more issues:
I agree, but there is also `rebase -i` to make these commits be atomic...
E.g. https://github.com/pydata/pandas/pull/11582
The 4 commits work on their own and AFAIR each commit tested green.
So: IMO important changes (especially everything which changes behaviour) should get one commit (or PR/Cherry pick as per your proposal) per behaviour change and this behaviour change should be explained in the commit message.
If the above PR would have been squashed, then 4 messages would be appended to each other and one couldn't separate which description would belong to which...
Jan
Triggered by a recent tweet of Wes ( https://twitter.com/wesmckinn/status/738111494852775936), I thought it would maybe a good time to evaluate the change in merging practices in pandas. I am certainly positive on the change in general (my only concern is that we shouldn't do additional changes in the middle of using the merge script, so you can still look at the 'Files changed' on github to see what actually changed). But, the main thing I was wondering: in the meantime github added the feature to squash on merge through the green merge button ( https://github.com/blog/2141-squash-your-commits). Are there still advantages of using our own custom script over the green button? - As far as I see, the main difference is that the github button does not include the text of the first comment in the squashed commit (but certainly for smaller PRs, this is not really a problem IMO, and the commit message gets a link to the PR) - On the plus side, it is easier (certainly for quickly merging a small PR, using the merge script is some overhead IMO), and you get the visual indication on github that the PR is 'merged' instead of 'closed' (but actually merged). Jeff, would you be OK with starting to use the green button again (with the squash on merge option, but you already disabled the merge commit option), at least for the smaller PRs? Regards, Joris 2016-01-18 23:59 GMT+01:00 Wes McKinney <wesmckinn@gmail.com>:
On Sun, Jan 17, 2016 at 3:34 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
Yeah, the point of this issue is not "there shall be no more than 1 commit per PR" but rather that smaller patches (i.e., most patches) should not degrade the signal-to-noise ratio of our commit history. Further, we should avoid merging commits that don't stand on their own. Lastly, merge commits generally only serve to degrade the SnR ratio.
Let's look at a sample of yesterday's commits:
https://www.dropbox.com/s/mp5yfp76h6h8z3y/commit-log-20160116.png?dl=0
No mistakes were made here, except that our current process (which Jeff has been following diligently) is resulting in a commit history that is less useful than it could be.
My preference is:
- Use the patch tool for smaller patches and large patches that haven't been split out into a series of incremental, standalone patches - For large patches that make sense as multiple incremental commits, none of which breaks the build, merge with --ff-only (rebasing as needed). I expect this to be rare.
I really like to avoid "edge-case driven development" -- we are bound to have patches where this guidance doesn't feel right, and we definitely don't have to dogmatically follow it.
For the record -- I spent some time reviewing the major category dtype pull requests that were merged in 2014, and given the sprawling nature of those changes and the huge amount of collaboration that took place, I agree it would have been preferable to fast-forward merge the incremental commits instead of squashing them into a couple of monolithic commits.
https://github.com/pydata/pandas/commit/0f62d3fc62f317538044ed3d349bfb89fb7e...
https://github.com/pydata/pandas/commit/ea0a13c172761348d08285a19ebf731cdabb...
- Wes
This IS currently enabled (merge squashing only). Though I haven't clicked the button so not exactly sure what it actually outputs. So I suppose it gives basically the same output then it would be ok. Using the script, I generally edit the commit to remove the - [ ] closes #xxxx - [ ] tests added / passed - [ ] passes ``git diff upstream/master | flake8 --diff`` - [ ] whatsnew entry which is what our PR template gives (and unless the user removes it it will propogate to the log) I *mostly* make them look like this, which is concise, yet retains all of the pertinent info. commit 6edf4471d1e55ce6b587a7e37dd540d787716413 Author: Mike Graham <mikegraham@gmail.com> Date: Mon Jun 6 08:10:10 2016 -0400 DOC: Fix wording/grammar for rolling's win_type argument. Author: Mike Graham <mikegraham@gmail.com> Closes #13376 from mikegraham/master and squashes the following commits: ec0c88e [Mike Graham] DOC: Fix wording/grammar for rolling's win_type argument. So all for using the button if it can capture a similar workflow (just haven't tried it out). The biggest reason NOT to use the button is that it gives you a chance to do a final check / test on your local machine with the actual merged content. More that a few times I have found issues and backed out of the process. Though for simpler changes (like a doc-typo that's not necessary). Finally merging locally DOES allow me to edit the whatsnew for consistency and such (or other files). Generally these are trivial things like spacing / doc-style and such, where a back-and-forth is not necessary. Jeff On Mon, Jun 6, 2016 at 8:56 AM, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Triggered by a recent tweet of Wes ( https://twitter.com/wesmckinn/status/738111494852775936), I thought it would maybe a good time to evaluate the change in merging practices in pandas. I am certainly positive on the change in general (my only concern is that we shouldn't do additional changes in the middle of using the merge script, so you can still look at the 'Files changed' on github to see what actually changed).
But, the main thing I was wondering: in the meantime github added the feature to squash on merge through the green merge button ( https://github.com/blog/2141-squash-your-commits). Are there still advantages of using our own custom script over the green button?
- As far as I see, the main difference is that the github button does not include the text of the first comment in the squashed commit (but certainly for smaller PRs, this is not really a problem IMO, and the commit message gets a link to the PR) - On the plus side, it is easier (certainly for quickly merging a small PR, using the merge script is some overhead IMO), and you get the visual indication on github that the PR is 'merged' instead of 'closed' (but actually merged).
Jeff, would you be OK with starting to use the green button again (with the squash on merge option, but you already disabled the merge commit option), at least for the smaller PRs?
Regards, Joris
2016-01-18 23:59 GMT+01:00 Wes McKinney <wesmckinn@gmail.com>:
On Sun, Jan 17, 2016 at 3:34 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
Yeah, the point of this issue is not "there shall be no more than 1 commit per PR" but rather that smaller patches (i.e., most patches) should not degrade the signal-to-noise ratio of our commit history. Further, we should avoid merging commits that don't stand on their own. Lastly, merge commits generally only serve to degrade the SnR ratio.
Let's look at a sample of yesterday's commits:
https://www.dropbox.com/s/mp5yfp76h6h8z3y/commit-log-20160116.png?dl=0
No mistakes were made here, except that our current process (which Jeff has been following diligently) is resulting in a commit history that is less useful than it could be.
My preference is:
- Use the patch tool for smaller patches and large patches that haven't been split out into a series of incremental, standalone patches - For large patches that make sense as multiple incremental commits, none of which breaks the build, merge with --ff-only (rebasing as needed). I expect this to be rare.
I really like to avoid "edge-case driven development" -- we are bound to have patches where this guidance doesn't feel right, and we definitely don't have to dogmatically follow it.
For the record -- I spent some time reviewing the major category dtype pull requests that were merged in 2014, and given the sprawling nature of those changes and the huge amount of collaboration that took place, I agree it would have been preferable to fast-forward merge the incremental commits instead of squashing them into a couple of monolithic commits.
https://github.com/pydata/pandas/commit/0f62d3fc62f317538044ed3d349bfb89fb7e...
https://github.com/pydata/pandas/commit/ea0a13c172761348d08285a19ebf731cdabb...
- Wes
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
I tried out the button yesterday to see what it does with a small patch: https://github.com/pydata/pandas/pull/13369 (and the actual commit: https://github.com/pydata/pandas/commit/e90d411714e7deac73e3e6b763ba9dccd354... ). So you can see I also edited the commit message (as the PR title didn't reflect the actual change anymore). So in general the button can capture a similar workflow (certainly for small patches). One difference is that you do not get links to the individual commits (only a list of the messages), but I am not sure this is essential (you can also see those in the PR). Joris 2016-06-06 15:25 GMT+02:00 Jeff Reback <jeffreback@gmail.com>:
This IS currently enabled (merge squashing only). Though I haven't clicked the button so not exactly sure what it actually outputs. So I suppose it gives basically the same output then it would be ok.
Using the script, I generally edit the commit to remove the
- [ ] closes #xxxx - [ ] tests added / passed - [ ] passes ``git diff upstream/master | flake8 --diff`` - [ ] whatsnew entry
which is what our PR template gives (and unless the user removes it it will propogate to the log)
I *mostly* make them look like this, which is concise, yet retains all of the pertinent info.
commit 6edf4471d1e55ce6b587a7e37dd540d787716413 Author: Mike Graham <mikegraham@gmail.com> Date: Mon Jun 6 08:10:10 2016 -0400
DOC: Fix wording/grammar for rolling's win_type argument.
Author: Mike Graham <mikegraham@gmail.com>
Closes #13376 from mikegraham/master and squashes the following commits:
ec0c88e [Mike Graham] DOC: Fix wording/grammar for rolling's win_type argument.
So all for using the button if it can capture a similar workflow (just haven't tried it out).
The biggest reason NOT to use the button is that it gives you a chance to do a final check / test on your local machine with the actual merged content. More that a few times I have found issues and backed out of the process. Though for simpler changes (like a doc-typo that's not necessary).
Finally merging locally DOES allow me to edit the whatsnew for consistency and such (or other files). Generally these are trivial things like spacing / doc-style and such, where a back-and-forth is not necessary.
Jeff
On Mon, Jun 6, 2016 at 8:56 AM, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Triggered by a recent tweet of Wes ( https://twitter.com/wesmckinn/status/738111494852775936), I thought it would maybe a good time to evaluate the change in merging practices in pandas. I am certainly positive on the change in general (my only concern is that we shouldn't do additional changes in the middle of using the merge script, so you can still look at the 'Files changed' on github to see what actually changed).
But, the main thing I was wondering: in the meantime github added the feature to squash on merge through the green merge button ( https://github.com/blog/2141-squash-your-commits). Are there still advantages of using our own custom script over the green button?
- As far as I see, the main difference is that the github button does not include the text of the first comment in the squashed commit (but certainly for smaller PRs, this is not really a problem IMO, and the commit message gets a link to the PR) - On the plus side, it is easier (certainly for quickly merging a small PR, using the merge script is some overhead IMO), and you get the visual indication on github that the PR is 'merged' instead of 'closed' (but actually merged).
Jeff, would you be OK with starting to use the green button again (with the squash on merge option, but you already disabled the merge commit option), at least for the smaller PRs?
Regards, Joris
2016-01-18 23:59 GMT+01:00 Wes McKinney <wesmckinn@gmail.com>:
On Sun, Jan 17, 2016 at 3:34 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
Yeah, the point of this issue is not "there shall be no more than 1 commit per PR" but rather that smaller patches (i.e., most patches) should not degrade the signal-to-noise ratio of our commit history. Further, we should avoid merging commits that don't stand on their own. Lastly, merge commits generally only serve to degrade the SnR ratio.
Let's look at a sample of yesterday's commits:
https://www.dropbox.com/s/mp5yfp76h6h8z3y/commit-log-20160116.png?dl=0
No mistakes were made here, except that our current process (which Jeff has been following diligently) is resulting in a commit history that is less useful than it could be.
My preference is:
- Use the patch tool for smaller patches and large patches that haven't been split out into a series of incremental, standalone patches - For large patches that make sense as multiple incremental commits, none of which breaks the build, merge with --ff-only (rebasing as needed). I expect this to be rare.
I really like to avoid "edge-case driven development" -- we are bound to have patches where this guidance doesn't feel right, and we definitely don't have to dogmatically follow it.
For the record -- I spent some time reviewing the major category dtype pull requests that were merged in 2014, and given the sprawling nature of those changes and the huge amount of collaboration that took place, I agree it would have been preferable to fast-forward merge the incremental commits instead of squashing them into a couple of monolithic commits.
https://github.com/pydata/pandas/commit/0f62d3fc62f317538044ed3d349bfb89fb7e...
https://github.com/pydata/pandas/commit/ea0a13c172761348d08285a19ebf731cdabb...
- Wes
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
I c, ok then that seems reasonable for small ones. Occasionally I have to fix a PR, so i pull it down, fix and then merge. It ends up looking very similar. Again these are usually very small / limited changes, but author just doesn't fix things. I usually have to manually add the issue reference in that case as the PR header text is not included. But that's why I review the commit message anyhow. So ok by me. Jeff On Mon, Jun 6, 2016 at 9:46 AM, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
I tried out the button yesterday to see what it does with a small patch: https://github.com/pydata/pandas/pull/13369 (and the actual commit: https://github.com/pydata/pandas/commit/e90d411714e7deac73e3e6b763ba9dccd354... ).
So you can see I also edited the commit message (as the PR title didn't reflect the actual change anymore). So in general the button can capture a similar workflow (certainly for small patches). One difference is that you do not get links to the individual commits (only a list of the messages), but I am not sure this is essential (you can also see those in the PR).
Joris
2016-06-06 15:25 GMT+02:00 Jeff Reback <jeffreback@gmail.com>:
This IS currently enabled (merge squashing only). Though I haven't clicked the button so not exactly sure what it actually outputs. So I suppose it gives basically the same output then it would be ok.
Using the script, I generally edit the commit to remove the
- [ ] closes #xxxx - [ ] tests added / passed - [ ] passes ``git diff upstream/master | flake8 --diff`` - [ ] whatsnew entry
which is what our PR template gives (and unless the user removes it it will propogate to the log)
I *mostly* make them look like this, which is concise, yet retains all of the pertinent info.
commit 6edf4471d1e55ce6b587a7e37dd540d787716413 Author: Mike Graham <mikegraham@gmail.com> Date: Mon Jun 6 08:10:10 2016 -0400
DOC: Fix wording/grammar for rolling's win_type argument.
Author: Mike Graham <mikegraham@gmail.com>
Closes #13376 from mikegraham/master and squashes the following commits:
ec0c88e [Mike Graham] DOC: Fix wording/grammar for rolling's win_type argument.
So all for using the button if it can capture a similar workflow (just haven't tried it out).
The biggest reason NOT to use the button is that it gives you a chance to do a final check / test on your local machine with the actual merged content. More that a few times I have found issues and backed out of the process. Though for simpler changes (like a doc-typo that's not necessary).
Finally merging locally DOES allow me to edit the whatsnew for consistency and such (or other files). Generally these are trivial things like spacing / doc-style and such, where a back-and-forth is not necessary.
Jeff
On Mon, Jun 6, 2016 at 8:56 AM, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Triggered by a recent tweet of Wes ( https://twitter.com/wesmckinn/status/738111494852775936), I thought it would maybe a good time to evaluate the change in merging practices in pandas. I am certainly positive on the change in general (my only concern is that we shouldn't do additional changes in the middle of using the merge script, so you can still look at the 'Files changed' on github to see what actually changed).
But, the main thing I was wondering: in the meantime github added the feature to squash on merge through the green merge button ( https://github.com/blog/2141-squash-your-commits). Are there still advantages of using our own custom script over the green button?
- As far as I see, the main difference is that the github button does not include the text of the first comment in the squashed commit (but certainly for smaller PRs, this is not really a problem IMO, and the commit message gets a link to the PR) - On the plus side, it is easier (certainly for quickly merging a small PR, using the merge script is some overhead IMO), and you get the visual indication on github that the PR is 'merged' instead of 'closed' (but actually merged).
Jeff, would you be OK with starting to use the green button again (with the squash on merge option, but you already disabled the merge commit option), at least for the smaller PRs?
Regards, Joris
2016-01-18 23:59 GMT+01:00 Wes McKinney <wesmckinn@gmail.com>:
On Sun, Jan 17, 2016 at 3:34 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
Yeah, the point of this issue is not "there shall be no more than 1 commit per PR" but rather that smaller patches (i.e., most patches) should not degrade the signal-to-noise ratio of our commit history. Further, we should avoid merging commits that don't stand on their own. Lastly, merge commits generally only serve to degrade the SnR ratio.
Let's look at a sample of yesterday's commits:
https://www.dropbox.com/s/mp5yfp76h6h8z3y/commit-log-20160116.png?dl=0
No mistakes were made here, except that our current process (which Jeff has been following diligently) is resulting in a commit history that is less useful than it could be.
My preference is:
- Use the patch tool for smaller patches and large patches that haven't been split out into a series of incremental, standalone patches - For large patches that make sense as multiple incremental commits, none of which breaks the build, merge with --ff-only (rebasing as needed). I expect this to be rare.
I really like to avoid "edge-case driven development" -- we are bound to have patches where this guidance doesn't feel right, and we definitely don't have to dogmatically follow it.
For the record -- I spent some time reviewing the major category dtype pull requests that were merged in 2014, and given the sprawling nature of those changes and the huge amount of collaboration that took place, I agree it would have been preferable to fast-forward merge the incremental commits instead of squashing them into a couple of monolithic commits.
https://github.com/pydata/pandas/commit/0f62d3fc62f317538044ed3d349bfb89fb7e...
https://github.com/pydata/pandas/commit/ea0a13c172761348d08285a19ebf731cdabb...
- Wes
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
+1 -- For small patches I think the squash-merge-cherry-pick button is fine. For larger issues (or where there is some useful discussion of the motivation for the patch in the PR description), it might be easier / better to use the patch tool. On Mon, Jun 6, 2016 at 6:58 AM, Jeff Reback <jeffreback@gmail.com> wrote:
I c, ok then that seems reasonable for small ones.
Occasionally I have to fix a PR, so i pull it down, fix and then merge. It ends up looking very similar. Again these are usually very small / limited changes, but author just doesn't fix things.
I usually have to manually add the issue reference in that case as the PR header text is not included.
But that's why I review the commit message anyhow.
So ok by me.
Jeff
On Mon, Jun 6, 2016 at 9:46 AM, Joris Van den Bossche <jorisvandenbossche@gmail.com> wrote:
I tried out the button yesterday to see what it does with a small patch: https://github.com/pydata/pandas/pull/13369 (and the actual commit: https://github.com/pydata/pandas/commit/e90d411714e7deac73e3e6b763ba9dccd354...).
So you can see I also edited the commit message (as the PR title didn't reflect the actual change anymore). So in general the button can capture a similar workflow (certainly for small patches). One difference is that you do not get links to the individual commits (only a list of the messages), but I am not sure this is essential (you can also see those in the PR).
Joris
2016-06-06 15:25 GMT+02:00 Jeff Reback <jeffreback@gmail.com>:
This IS currently enabled (merge squashing only). Though I haven't clicked the button so not exactly sure what it actually outputs. So I suppose it gives basically the same output then it would be ok.
Using the script, I generally edit the commit to remove the
- [ ] closes #xxxx - [ ] tests added / passed - [ ] passes ``git diff upstream/master | flake8 --diff`` - [ ] whatsnew entry
which is what our PR template gives (and unless the user removes it it will propogate to the log)
I *mostly* make them look like this, which is concise, yet retains all of the pertinent info.
commit 6edf4471d1e55ce6b587a7e37dd540d787716413 Author: Mike Graham <mikegraham@gmail.com> Date: Mon Jun 6 08:10:10 2016 -0400
DOC: Fix wording/grammar for rolling's win_type argument.
Author: Mike Graham <mikegraham@gmail.com>
Closes #13376 from mikegraham/master and squashes the following commits:
ec0c88e [Mike Graham] DOC: Fix wording/grammar for rolling's win_type argument.
So all for using the button if it can capture a similar workflow (just haven't tried it out).
The biggest reason NOT to use the button is that it gives you a chance to do a final check / test on your local machine with the actual merged content. More that a few times I have found issues and backed out of the process. Though for simpler changes (like a doc-typo that's not necessary).
Finally merging locally DOES allow me to edit the whatsnew for consistency and such (or other files). Generally these are trivial things like spacing / doc-style and such, where a back-and-forth is not necessary.
Jeff
On Mon, Jun 6, 2016 at 8:56 AM, Joris Van den Bossche <jorisvandenbossche@gmail.com> wrote:
Triggered by a recent tweet of Wes (https://twitter.com/wesmckinn/status/738111494852775936), I thought it would maybe a good time to evaluate the change in merging practices in pandas. I am certainly positive on the change in general (my only concern is that we shouldn't do additional changes in the middle of using the merge script, so you can still look at the 'Files changed' on github to see what actually changed).
But, the main thing I was wondering: in the meantime github added the feature to squash on merge through the green merge button (https://github.com/blog/2141-squash-your-commits). Are there still advantages of using our own custom script over the green button?
- As far as I see, the main difference is that the github button does not include the text of the first comment in the squashed commit (but certainly for smaller PRs, this is not really a problem IMO, and the commit message gets a link to the PR) - On the plus side, it is easier (certainly for quickly merging a small PR, using the merge script is some overhead IMO), and you get the visual indication on github that the PR is 'merged' instead of 'closed' (but actually merged).
Jeff, would you be OK with starting to use the green button again (with the squash on merge option, but you already disabled the merge commit option), at least for the smaller PRs?
Regards, Joris
2016-01-18 23:59 GMT+01:00 Wes McKinney <wesmckinn@gmail.com>:
On Sun, Jan 17, 2016 at 3:34 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
Yeah, the point of this issue is not "there shall be no more than 1 commit per PR" but rather that smaller patches (i.e., most patches) should not degrade the signal-to-noise ratio of our commit history. Further, we should avoid merging commits that don't stand on their own. Lastly, merge commits generally only serve to degrade the SnR ratio.
Let's look at a sample of yesterday's commits:
https://www.dropbox.com/s/mp5yfp76h6h8z3y/commit-log-20160116.png?dl=0
No mistakes were made here, except that our current process (which Jeff has been following diligently) is resulting in a commit history that is less useful than it could be.
My preference is:
- Use the patch tool for smaller patches and large patches that haven't been split out into a series of incremental, standalone patches - For large patches that make sense as multiple incremental commits, none of which breaks the build, merge with --ff-only (rebasing as needed). I expect this to be rare.
I really like to avoid "edge-case driven development" -- we are bound to have patches where this guidance doesn't feel right, and we definitely don't have to dogmatically follow it.
For the record -- I spent some time reviewing the major category dtype pull requests that were merged in 2014, and given the sprawling nature of those changes and the huge amount of collaboration that took place, I agree it would have been preferable to fast-forward merge the incremental commits instead of squashing them into a couple of monolithic commits.
https://github.com/pydata/pandas/commit/0f62d3fc62f317538044ed3d349bfb89fb7e...
https://github.com/pydata/pandas/commit/ea0a13c172761348d08285a19ebf731cdabb...
- Wes
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
participants (5)
-
Jan Schulz -
Jeff Reback -
Joris Van den Bossche -
Nathaniel Smith -
Wes McKinney