proposal: new commit guidelines for backportable bugfixes

hi, I have been doing a lot of backporting for the last few bugfix releases and noticed that our current approach committing to master and cherrypicking is not so good for the git history. When cherry picking a bugfix from master to a maintenance branch both branches contain a commit with the same content and git knows of no relation between them. This causes unnecessary merge conflicts when cherry picking two changes that modify the same file. The git version (1.9.1) I am using is not smart enough too figure out the changesets in both leaf commits is the same. Additionally the output of `git log maintenance/1.9.x..master` becomes very large as all already backported issues appear again in master. [0]
To help with this I want to propose new best practices for pull requests of bugfixes suitable for backporting. Instead of basing the bugfix on the head commit of the master, base them on the merge base between master and the latest maintenance branch. This allows merging the PR into both master and the maintenance branch without pulling in any extra changes from either branches. Then both branches contain the same commit and gits automerging can work better and git log will only show you the commits that are only really on one branch or the other.
In practice this is very simple. You can still develop your bugfix on master but before you push it you just run:
git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^
In most bugfix PRs this should work without conflict as they should be relatively small. If you get a merge conflict during this operation, just do git rebase --abort and do a normal pull request, in that case the backporter should worry about the conflict.
Does this sound like a reasonable procedure? Cheers, Julian
[0] git cherry is supposed to help with that, but it never really worked properly for me

On Fri, Jul 18, 2014 at 8:13 AM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
hi, I have been doing a lot of backporting for the last few bugfix releases and noticed that our current approach committing to master and cherrypicking is not so good for the git history. When cherry picking a bugfix from master to a maintenance branch both branches contain a commit with the same content and git knows of no relation between them. This causes unnecessary merge conflicts when cherry picking two changes that modify the same file. The git version (1.9.1) I am using is not smart enough too figure out the changesets in both leaf commits is the same. Additionally the output of `git log maintenance/1.9.x..master` becomes very large as all already backported issues appear again in master. [0]
To help with this I want to propose new best practices for pull requests of bugfixes suitable for backporting. Instead of basing the bugfix on the head commit of the master, base them on the merge base between master and the latest maintenance branch. This allows merging the PR into both master and the maintenance branch without pulling in any extra changes from either branches. Then both branches contain the same commit and gits automerging can work better and git log will only show you the commits that are only really on one branch or the other.
In practice this is very simple. You can still develop your bugfix on master but before you push it you just run:
git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^
In most bugfix PRs this should work without conflict as they should be relatively small. If you get a merge conflict during this operation, just do git rebase --abort and do a normal pull request, in that case the backporter should worry about the conflict.
Does this sound like a reasonable procedure? Cheers, Julian
[0] git cherry is supposed to help with that, but it never really worked properly for me
Arrived here promptly. This looks OK to me, but with the understanding that a number of folks won't know what is going on. It should be documented in doc/source/dev/gitwash/development_workflow.rst and perhaps a command alias in .git/config would help, something like npyrebase, or hopefully something better ;)
Chuck

On Fri, Jul 18, 2014 at 5:38 PM, Charles R Harris charlesr.harris@gmail.com wrote:
On Fri, Jul 18, 2014 at 8:13 AM, Julian Taylor jtaylor.debian@googlemail.com wrote:
hi, I have been doing a lot of backporting for the last few bugfix releases and noticed that our current approach committing to master and cherrypicking is not so good for the git history. When cherry picking a bugfix from master to a maintenance branch both branches contain a commit with the same content and git knows of no relation between them. This causes unnecessary merge conflicts when cherry picking two changes that modify the same file. The git version (1.9.1) I am using is not smart enough too figure out the changesets in both leaf commits is the same. Additionally the output of `git log maintenance/1.9.x..master` becomes very large as all already backported issues appear again in master. [0]
To help with this I want to propose new best practices for pull requests of bugfixes suitable for backporting. Instead of basing the bugfix on the head commit of the master, base them on the merge base between master and the latest maintenance branch. This allows merging the PR into both master and the maintenance branch without pulling in any extra changes from either branches. Then both branches contain the same commit and gits automerging can work better and git log will only show you the commits that are only really on one branch or the other.
In practice this is very simple. You can still develop your bugfix on master but before you push it you just run:
git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^
In most bugfix PRs this should work without conflict as they should be relatively small. If you get a merge conflict during this operation, just do git rebase --abort and do a normal pull request, in that case the backporter should worry about the conflict.
Does this sound like a reasonable procedure? Cheers, Julian
[0] git cherry is supposed to help with that, but it never really worked properly for me
Arrived here promptly. This looks OK to me, but with the understanding that a number of folks won't know what is going on. It should be documented in doc/source/dev/gitwash/development_workflow.rst and perhaps a command alias in .git/config would help, something like npyrebase, or hopefully something better ;)
Chuck
Yes of course I would document it when its ok for everyone. I do not want that this inconveniences contributors, maybe we can just ask for it if extra changes are required for the PR anyway. I would just like that the people who merge PR's (which are currently just a handful) try to use this method when the PR is applicable for a maintenance branch. We can add a small tool that does what does the rebase, merges to both master and the branch and closes the PR, something like: tools/merge-backport-pr #pr-number <maintenance-branch>

On 18 Jul 2014 15:36, "Julian Taylor" jtaylor.debian@googlemail.com wrote:
git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^
As a potential refinement, this might be simpler if we define a branch that points to this commit.
-n

On Fri, Jul 18, 2014 at 6:23 PM, Nathaniel Smith njs@pobox.com wrote:
On 18 Jul 2014 15:36, "Julian Taylor" jtaylor.debian@googlemail.com wrote:
git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^
As a potential refinement, this might be simpler if we define a branch that points to this commit.
we could do that, though the merge base changes to the last commit that was merged in that way. The old merge base is still valid but much older. I applied this method to some of my bugfixes so the current merge base of master and 1.9 is a commit from yesterday not anymore the diverging point of master and 1.9. But I don't know if the newer merge base makes any difference to git.

18.07.2014 19:35, Julian Taylor kirjoitti:
On Fri, Jul 18, 2014 at 6:23 PM, Nathaniel Smith njs@pobox.com wrote:
On 18 Jul 2014 15:36, "Julian Taylor" jtaylor.debian@googlemail.com wrote:
git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^
As a potential refinement, this might be simpler if we define a branch that points to this commit.
we could do that, though the merge base changes to the last commit that was merged in that way. The old merge base is still valid but much older. I applied this method to some of my bugfixes so the current merge base of master and 1.9 is a commit from yesterday not anymore the diverging point of master and 1.9. But I don't know if the newer merge base makes any difference to git.
Will the merge base actually ever change if you don't merge the branches to each other?
***
The other well-known alternative to bugfixes is to first commit it in the earliest maintenance branch where you want to have it, and then merge that branch forward to the newer maintenance branches, and finally into master.
Pauli

On 18.07.2014 19:47, Pauli Virtanen wrote:
18.07.2014 19:35, Julian Taylor kirjoitti:
On Fri, Jul 18, 2014 at 6:23 PM, Nathaniel Smith njs@pobox.com wrote:
On 18 Jul 2014 15:36, "Julian Taylor" jtaylor.debian@googlemail.com wrote:
git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^
As a potential refinement, this might be simpler if we define a branch that points to this commit.
we could do that, though the merge base changes to the last commit that was merged in that way. The old merge base is still valid but much older. I applied this method to some of my bugfixes so the current merge base of master and 1.9 is a commit from yesterday not anymore the diverging point of master and 1.9. But I don't know if the newer merge base makes any difference to git.
Will the merge base actually ever change if you don't merge the branches to each other
we want to merge them into each other so a change of merge base is unavoidable.
The other well-known alternative to bugfixes is to first commit it in the earliest maintenance branch where you want to have it, and then merge that branch forward to the newer maintenance branches, and finally into master.
wouldn't that still require basing bugfixes onto the point before the master and maintenance branch diverged? otherwise a merge from maintenance to master would include the commits that are only part of the maintenance branch (release commits, regression fixes etc.)
basing bugfixes on maintenance does allow cherry picking into master as you don't care too much about backward mergeability here, but you still lose a good git log and git branch --contains to check which bugfix is in which branch.

18.07.2014 23:53, Julian Taylor kirjoitti:
On 18.07.2014 19:47, Pauli Virtanen wrote:
[clip]
The other well-known alternative to bugfixes is to first commit it in the earliest maintenance branch where you want to have it, and then merge that branch forward to the newer maintenance branches, and finally into master.
wouldn't that still require basing bugfixes onto the point before the master and maintenance branch diverged? otherwise a merge from maintenance to master would include the commits that are only part of the maintenance branch (release commits, regression fixes etc.)
If I understand correctly, the idea is to manually revert the changes that don't belong in, which needs to be only done once for each, as the merge logic should deal with it in all subsequent merges.
I think there are in practice not so many commits that you want to have only in the release branch. Version number bumping is one (and easily addressed by a follow-up commit in master that bumps it again) --- what else?
The bugfix-in-release-and-forward-port-to-master seems to be the recommended practice for Mercurial:
http://mercurial.selenic.com/wiki/StandardBranching
https://docs.python.org/devguide/committing.html
I think there are also git guides that recommend using it.
The option of basing commits on last merge base is probably not really feasible with Mercurial (I haven't seen git guides that propose it either).
basing bugfixes on maintenance does allow cherry picking into master as you don't care too much about backward mergeability here, but you still lose a good git log and git branch --contains to check which bugfix is in which branch.
I don't disagree with this. Cherry picking is OK, but only as long as the number of commits is not too large and you use a tool (e.g. my git-cherry-tree) that tries to check which patches are in and which not.
Pauli

On Fri, Jul 18, 2014 at 11:44 PM, Pauli Virtanen pav@iki.fi wrote:
18.07.2014 23:53, Julian Taylor kirjoitti:
On 18.07.2014 19:47, Pauli Virtanen wrote:
[clip]
The other well-known alternative to bugfixes is to first commit it in the earliest maintenance branch where you want to have it, and then merge that branch forward to the newer maintenance branches, and finally into master.
wouldn't that still require basing bugfixes onto the point before the master and maintenance branch diverged? otherwise a merge from maintenance to master would include the commits that are only part of the maintenance branch (release commits, regression fixes etc.)
If I understand correctly, the idea is to manually revert the changes that don't belong in, which needs to be only done once for each, as the merge logic should deal with it in all subsequent merges.
I think there are in practice not so many commits that you want to have only in the release branch. Version number bumping is one (and easily addressed by a follow-up commit in master that bumps it again) --- what else?
Presumably all the commits that we miss on the first pass and end up backporting the hard way later :-)

19.07.2014 01:49, Nathaniel Smith kirjoitti:
On Fri, Jul 18, 2014 at 11:44 PM, Pauli Virtanen pav@iki.fi wrote:
18.07.2014 23:53, Julian Taylor kirjoitti:
On 18.07.2014 19:47, Pauli Virtanen wrote:
[clip]
The other well-known alternative to bugfixes is to first commit it in the earliest maintenance branch where you want to have it, and then merge that branch forward to the newer maintenance branches, and finally into master.
wouldn't that still require basing bugfixes onto the point before the master and maintenance branch diverged? otherwise a merge from maintenance to master would include the commits that are only part of the maintenance branch (release commits, regression fixes etc.)
If I understand correctly, the idea is to manually revert the changes that don't belong in, which needs to be only done once for each, as the merge logic should deal with it in all subsequent merges.
I think there are in practice not so many commits that you want to have only in the release branch. Version number bumping is one (and easily addressed by a follow-up commit in master that bumps it again) --- what else?
Presumably all the commits that we miss on the first pass and end up backporting the hard way later :-)
If those are just cherry-picked, they will generate merge conflicts the next time things are merged back (or, the merge will be smart enough to note the patch was already applied some time ago). This is then probably not really a big problem.
Pauli

19.07.2014 02:10, Pauli Virtanen kirjoitti:
19.07.2014 01:49, Nathaniel Smith kirjoitti:
On Fri, Jul 18, 2014 at 11:44 PM, Pauli Virtanen pav@iki.fi wrote:
[clip]
Presumably all the commits that we miss on the first pass and end up backporting the hard way later :-)
If those are just cherry-picked, they will generate merge conflicts the next time things are merged back (or, the merge will be smart enough to note the patch was already applied some time ago). This is then probably not really a big problem.
NB. this is a bit playing devil's advocate --- I'm not advocating porting bugfixes from merge branches, as using the merge base should also work fine.

On Sat, Jul 19, 2014 at 12:44 AM, Pauli Virtanen pav@iki.fi wrote:
18.07.2014 23:53, Julian Taylor kirjoitti:
On 18.07.2014 19:47, Pauli Virtanen wrote:
[clip]
The other well-known alternative to bugfixes is to first commit it in the earliest maintenance branch where you want to have it, and then merge that branch forward to the newer maintenance branches, and finally into master.
wouldn't that still require basing bugfixes onto the point before the master and maintenance branch diverged? otherwise a merge from maintenance to master would include the commits that are only part of the maintenance branch (release commits, regression fixes etc.)
If I understand correctly, the idea is to manually revert the changes that don't belong in, which needs to be only done once for each, as the merge logic should deal with it in all subsequent merges.
I think there are in practice not so many commits that you want to have only in the release branch. Version number bumping is one (and easily addressed by a follow-up commit in master that bumps it again) --- what else?
The bugfix-in-release-and-forward-port-to-master seems to be the recommended practice for Mercurial:
http://mercurial.selenic.com/wiki/StandardBranching https://docs.python.org/devguide/committing.html
I think there are also git guides that recommend using it.
The option of basing commits on last merge base is probably not really feasible with Mercurial (I haven't seen git guides that propose it either).
basing bugfixes on maintenance does allow cherry picking into master as you don't care too much about backward mergeability here, but you still lose a good git log and git branch --contains to check which bugfix is in which branch.
I don't disagree with this. Cherry picking is OK, but only as long as the number of commits is not too large
This should be the case most of the time I think. It looks like we've started backporting more and more though, even things like minor doc fixes. The maintenance overhead would be much lower if we would stick to only backporting important bug fixes.
Any strategy chosen is fine with me, but I would like to see considered how this affects the number of PRs and the complexity for occasional contributors. Those contributors can't really judge what's backportable and don't want to deal with rebasing. So the new strategy would be something like: 1. bugfix PR sent to master by contributor 2. maintainer decides it's backportable, so after review he doesn't merge PR but rebases it and sends a second PR. First one, with review content, is closed not merged. 3. merge PR into maintenance branch. 4. send third PR to merge back or forward port the fix to master, and merge that. (or some variation with merge bases which is even more involved)
Compare to what we did a while ago for numpy and still do for scipy: 1. all PRs are sent to master 2. hit green button after review 3. bugfix is cherry-picked and pushed directly to the maintenance branch
The downside of the second strategy is indeed the occasional extra merge conflict, but having 3x less PRs, 2x less merge commits and a less confusing process for occasional contributors could well be worth dealing with that merge conflict.
Cheers, Ralf
and you use a tool (e.g. my
git-cherry-tree) that tries to check which patches are in and which not.
Pauli
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

19.07.2014 11:04, Ralf Gommers kirjoitti: [clip]
- bugfix PR sent to master by contributor
- maintainer decides it's backportable, so after review he doesn't merge
PR but rebases it and sends a second PR. First one, with review content, is closed not merged. 3. merge PR into maintenance branch. 4. send third PR to merge back or forward port the fix to master, and merge that. (or some variation with merge bases which is even more involved)
The maintainer can just rebase on merge base, and then merge and push it via git as usual, without having to deal with Github. If the pull request happens to be already based on an OK merge base, it can be merged via Github directly to master.
The only thing maintainer gains from sending additional pull request via Github is that the code gets run by Travis-CI. However, the tests will also run automatically after pushing the merge commits, so test failures can be caught (although after the fact). This is also the case for directly pushed cherry-picked commits.

On Sat, Jul 19, 2014 at 12:29 PM, Pauli Virtanen pav@iki.fi wrote:
19.07.2014 11:04, Ralf Gommers kirjoitti: [clip]
- bugfix PR sent to master by contributor
- maintainer decides it's backportable, so after review he doesn't
merge
PR but rebases it and sends a second PR. First one, with review content,
is
closed not merged. 3. merge PR into maintenance branch. 4. send third PR to merge back or forward port the fix to master, and merge that. (or some variation with merge bases which is even more involved)
The maintainer can just rebase on merge base, and then merge and push it via git as usual, without having to deal with Github.
I agree, but note that that's not what's happening in the numpy repo at the moment and that Julian (and maybe Chuck as well?) is explicitly against any direct pushes. So the 3x more PRs between what the process used to be and what Julian proposes is not unrealistic.
Maybe still worth it, but it's a trade-off (example: I used to use "gitk --all", but it's a spaghetti now).
Ralf
If the pull request happens to be already based on an OK merge base, it can be merged via Github directly to master.
The only thing maintainer gains from sending additional pull request via Github is that the code gets run by Travis-CI. However, the tests will also run automatically after pushing the merge commits, so test failures can be caught (although after the fact). This is also the case for directly pushed cherry-picked commits.
-- Pauli Virtanen
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On 19.07.2014 13:04, Ralf Gommers wrote:
On Sat, Jul 19, 2014 at 12:29 PM, Pauli Virtanen <pav@iki.fi mailto:pav@iki.fi> wrote:
19.07.2014 11:04, Ralf Gommers kirjoitti: [clip] > 1. bugfix PR sent to master by contributor > 2. maintainer decides it's backportable, so after review he doesn't merge > PR but rebases it and sends a second PR. First one, with review content, is > closed not merged. > 3. merge PR into maintenance branch. > 4. send third PR to merge back or forward port the fix to master, and > merge that. > (or some variation with merge bases which is even more involved) The maintainer can just rebase on merge base, and then merge and push it via git as usual, without having to deal with Github.
I agree, but note that that's not what's happening in the numpy repo at the moment and that Julian (and maybe Chuck as well?) is explicitly against any direct pushes. So the 3x more PRs between what the process used to be and what Julian proposes is not unrealistic.
It is what is happening at the numpy repo. We are never directly pushing unreviewed changes, we always have at least one PR. We only directly push changes that have been approved to be applied two more than one branch. With the method I propose there are not any more PRs. You have the main PR targeted to master and the bugfix PR targeted to the maintenance branch, it was the same before except the bugfix PR was a cherry pick instead of a merge. When directly pushing the second merge we even cut one PR from the process. E.g. I pushed Pauls PR #4882 directly to 1.9 without asking him to create a new PR but as far as git is concerned there is no difference, it as still two merges.
We could always ask for a new PR for the branch merge to see travis results before the merge. E.g. #4877 and #4891 same branch two PRs two merges. I don't think that should be currently required as master and 1.9 are almost identical and there is little value in seeing travis results for the second merge before doing the merge. But when the branches diverge more the two PRs should probably be preferred to avoid having broken commits on the branches that make bisecting harder.

On Sat, Jul 19, 2014 at 1:26 PM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On 19.07.2014 13:04, Ralf Gommers wrote:
On Sat, Jul 19, 2014 at 12:29 PM, Pauli Virtanen <pav@iki.fi mailto:pav@iki.fi> wrote:
19.07.2014 11:04, Ralf Gommers kirjoitti: [clip] > 1. bugfix PR sent to master by contributor > 2. maintainer decides it's backportable, so after review he doesn't merge > PR but rebases it and sends a second PR. First one, with review content, is > closed not merged. > 3. merge PR into maintenance branch. > 4. send third PR to merge back or forward port the fix to master, and > merge that. > (or some variation with merge bases which is even more involved) The maintainer can just rebase on merge base, and then merge and
push it
via git as usual, without having to deal with Github.
I agree, but note that that's not what's happening in the numpy repo at the moment and that Julian (and maybe Chuck as well?) is explicitly against any direct pushes. So the 3x more PRs between what the process used to be and what Julian proposes is not unrealistic.
It is what is happening at the numpy repo. We are never directly pushing unreviewed changes, we always have at least one PR. We only directly push changes that have been approved to be applied two more than one branch.
OK never mind then. I was pretty sure you said you were against this, and I see a lot of PRs for simple backports in 1.8.x and 1.9.x. If you now say it's fine (or even preferred) to push directly, my worry about multiple PRs isn't relevant anymore.
Ralf
With the method I propose there are not any more PRs. You have the main PR targeted to master
and the bugfix PR targeted to the maintenance
branch, it was the same before except the bugfix PR was a cherry pick instead of a merge. When directly pushing the second merge we even cut one PR from the process. E.g. I pushed Pauls PR #4882 directly to 1.9 without asking him to create a new PR but as far as git is concerned there is no difference, it as still two merges.
We could always ask for a new PR for the branch merge to see travis results before the merge. E.g. #4877 and #4891 same branch two PRs two merges. I don't think that should be currently required as master and 1.9 are almost identical and there is little value in seeing travis results for the second merge before doing the merge. But when the branches diverge more the two PRs should probably be preferred to avoid having broken commits on the branches that make bisecting harder. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On 19.07.2014 14:09, Ralf Gommers wrote:
On Sat, Jul 19, 2014 at 1:26 PM, Julian Taylor <jtaylor.debian@googlemail.com mailto:jtaylor.debian@googlemail.com> wrote:
On 19.07.2014 13:04, Ralf Gommers wrote: > > > > On Sat, Jul 19, 2014 at 12:29 PM, Pauli Virtanen <pav@iki.fi <mailto:pav@iki.fi> > <mailto:pav@iki.fi <mailto:pav@iki.fi>>> wrote: > > 19.07.2014 11:04, Ralf Gommers kirjoitti: > [clip] > > 1. bugfix PR sent to master by contributor > > 2. maintainer decides it's backportable, so after review he > doesn't merge > > PR but rebases it and sends a second PR. First one, with review > content, is > > closed not merged. > > 3. merge PR into maintenance branch. > > 4. send third PR to merge back or forward port the fix to > master, and > > merge that. > > (or some variation with merge bases which is even more involved) > > The maintainer can just rebase on merge base, and then merge and push it > via git as usual, without having to deal with Github. > > > I agree, but note that that's not what's happening in the numpy repo at > the moment and that Julian (and maybe Chuck as well?) is explicitly > against any direct pushes. So the 3x more PRs between what the process > used to be and what Julian proposes is not unrealistic. > It is what is happening at the numpy repo. We are never directly pushing unreviewed changes, we always have at least one PR. We only directly push changes that have been approved to be applied two more than one branch.
OK never mind then. I was pretty sure you said you were against this, and I see a lot of PRs for simple backports in 1.8.x and 1.9.x. If you now say it's fine (or even preferred) to push directly, my worry about multiple PRs isn't relevant anymore.
thats not what I'm saying. I'm strongly against pushing unreviewed changes. There must *always* be at least one PR. Pushing this PR to multiple branches without another PR is fine with me if it makes sense in the situation (== the merge is trivial enough to not need *another* review)

Julian Taylor jtaylor.debian@googlemail.com wrote:
git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^
That's the problem with Git, it solves one problem an creates another. Personally I have no idea what that command might do.
Sturla
participants (6)
-
Charles R Harris
-
Julian Taylor
-
Nathaniel Smith
-
Pauli Virtanen
-
Ralf Gommers
-
Sturla Molden