Mercurial workflow question...

Scenario: I'm working on a change that I want to actively test on a bunch of Snakebite hosts. Getting the change working is going to be an iterative process -- lots of small commits trying to attack the problem one little bit at a time.
Eventually I'll get to a point where I'm happy with the change. So, it's time to do all the necessary cruft that needs to be done before making the change public. Updating docs, tweaking style, Misc/NEWS, etc. That'll involve at least a few more commits. Most changes will also need to be merged to other branches, too, so that needs to be taken care of. (And it's a given that I would have been pulling and merging from hg.p.o/cpython during the whole process.)
Then, finally, it's time to push.
Now, if I understand how Mercurial works correctly, using the above workflow will result in all those little intermediate hacky commits being forever preserved in the global/public cpython repo. I will have polluted the history of all affected files with all my changes.
That just doesn't "feel" right. But, it appears as though it's an intrinsic side-effect of how Mercurial works. With git, you have a bit more flexibility to affect how your final public commits via merge fast-forwarding. Subversion gives you the ultimate control of how your final commit looks (albeit at the expense of having to do the merging in a much more manual fashion).
As I understand it, even if I contain all my intermediate commits in a server-side cloned repo, that doesn't really change anything; all commits will eventually be reflected in cpython via the final `hg push`.
So, my first question is this: is this actually a problem? Is the value I'm placing on "pristine" log histories misplaced in the DVCS world? Do we, as a collective, care?
I can think of two alternate approaches I could use:
- Use a common NFS mount for each source tree on every Snakebite box (and coercing each build to be done in a separate area). Get everything perfect and then do a single commit of all changes. The thing I don't like about this approach is that I can't commit/rollback/tweak/bisect intermediate commits as I go along -- some changes are complex and take a few attempts to get right.
- Use a completely separate clone to house all the intermediate commits, then generate a diff once the final commit is ready, then apply that diff to the main cpython repo, then push that. This approach is fine, but it seems counter-intuitive to the whole concept of DVCS.
Thoughts?
Trent.

On Fri, Dec 14, 2012 at 12:02 PM, Larry Hastings larry@hastings.org wrote:
On 12/13/2012 05:21 PM, Trent Nelson wrote:
Thoughts?
% hg help rebase
And also the histedit extension (analagous to "git rebase -i").
Both Git and Hg recognise there is a difference between interim commits and ones you want to publish and provide tools to revise a series of commits into a simpler set for publication to an official repo. The difference is that in Git this is allowed by default for all branches (which can create fun and games if someone upstream of you edits the history of you branch you used as a base for your own work), while Hg makes a distinction between different phases (secret -> draft -> public) and disallows operations that rewrite history if they would affect public changesets.
So the challenge with Mercurial over Git is ensuring the relevant branches stay in "draft" mode locally even though you want to push them to a server-side clone for distribution to the build servers. I know one way to do that would be to ask that the relevant clone be switched to non-publishing mode (see http://mercurial.selenic.com/wiki/Phases#Publishing_Repository). I don't know if there's another way to do it without altering the config on the server.
General intro to phases: http://www.logilab.org/blogentry/88203
Cheers, Nick.

On Dec 14, 2012, at 12:36 PM, Nick Coghlan wrote:
Both Git and Hg recognise there is a difference between interim commits and ones you want to publish and provide tools to revise a series of commits into a simpler set for publication to an official repo.
One of the things I love about Bazaar is that it has a concept of "main line of development" that usually makes all this hand-wringing a non-issue. When I merge my development branch, with all its interim commits into trunk, all those revisions go with it. But it never matters because when you view history (and bisect, etc.) on trunk, you see the merge as one commit.
Sure, you can descend into the right-hand side if you want to see all those sub-commits, and the graphical tools allow you to expand them fairly easily, but usually you just ignore them.
Nothing's completely for free of course, and having a main line of development does mean you have to be careful about merge directionality, but that's generally something you ingrain in your workflow once, and then forget about it. The bottom line is that Bazaar users rarely feel the need to rebase, even though you can if you want to.
Cheers, -Barry

On Thu, 13 Dec 2012 20:21:24 -0500, Trent Nelson trent@snakebite.org wrote:
- Use a completely separate clone to house all the intermediate commits, then generate a diff once the final commit is ready, then apply that diff to the main cpython repo, then push that. This approach is fine, but it seems counter-intuitive to the whole concept of DVCS.
Perhaps. But that's exactly what I did with the email package changes for 3.3.
You seem to have a tension between "all those dirty little commits" and "clean history" and the fact that a dvcs is designed to preserve all those commits...if you don't want those intermediate commits in the official repo, then why is a diff/patch a bad way to achieve that? If you keep your pulls up to date in your feature repo, the diff/patch process is simple and smooth.
The repo I worked on the email features in is still available, too, if anyone is crazy enough to want to know about those intermediate steps...
--David

On Thu, Dec 13, 2012 at 6:48 PM, R. David Murray rdmurray@bitdance.com wrote:
On Thu, 13 Dec 2012 20:21:24 -0500, Trent Nelson trent@snakebite.org wrote:
- Use a completely separate clone to house all the intermediate commits, then generate a diff once the final commit is ready, then apply that diff to the main cpython repo, then push that. This approach is fine, but it seems counter-intuitive to the whole concept of DVCS.
Perhaps. But that's exactly what I did with the email package changes for 3.3.
You seem to have a tension between "all those dirty little commits" and "clean history" and the fact that a dvcs is designed to preserve all those commits...if you don't want those intermediate commits in the official repo, then why is a diff/patch a bad way to achieve that?
Right. And you usually have to do this beforehand anyways to upload your changes to the tracker for review.
Also, for the record (not that anyone has said anything to the contrary), our dev guide says, "You should collapse changesets of a single feature or bugfix before pushing the result to the main repository. The reason is that we don’t want the history to be full of intermediate commits recording the private history of the person working on a patch. If you are using the rebase extension, consider adding the --collapse option to hg rebase. The collapse extension is another choice."
(from http://docs.python.org/devguide/committing.html#working-with-mercurial )
--Chris

On Dec 13, 2012, at 7:00 PM, Chris Jerdonek chris.jerdonek@gmail.com wrote:
On Thu, Dec 13, 2012 at 6:48 PM, R. David Murray rdmurray@bitdance.com wrote:
On Thu, 13 Dec 2012 20:21:24 -0500, Trent Nelson trent@snakebite.org wrote:
- Use a completely separate clone to house all the intermediate commits, then generate a diff once the final commit is ready, then apply that diff to the main cpython repo, then push that. This approach is fine, but it seems counter-intuitive to the whole concept of DVCS.
Perhaps. But that's exactly what I did with the email package changes for 3.3.
You seem to have a tension between "all those dirty little commits" and "clean history" and the fact that a dvcs is designed to preserve all those commits...if you don't want those intermediate commits in the official repo, then why is a diff/patch a bad way to achieve that?
Right. And you usually have to do this beforehand anyways to upload your changes to the tracker for review.
Also, for the record (not that anyone has said anything to the contrary), our dev guide says, "You should collapse changesets of a single feature or bugfix before pushing the result to the main repository. The reason is that we don’t want the history to be full of intermediate commits recording the private history of the person working on a patch. If you are using the rebase extension, consider adding the --collapse option to hg rebase. The collapse extension is another choice."
(from http://docs.python.org/devguide/committing.html#working-with-mercurial )
Does hg's ability to "make merges easier than svn" depend on having all the intermediate commits? I thought the theory was that the smaller changesets provided extra information that made it possible to merge two expansive groups of changes.
Raymond

Possibly. A collapsed changeset is more likely to have larger hunks of changes e.g. two changesets that each modified adjacent pieces of code get collapsed down to a single change hunk - which would make the merge machinery have to work harder to detect moved hunks, etc.
In practice, so long as each collapsed changeset is for a single change I haven't seen this be a major issue. However, I'm personally a "create a new named branch for each task, keep all intermediate history" kind of guy (and I get to set the rules for my team ;) so I don't see collapsed changesets very often.
Tim Delaney
On 17 December 2012 16:17, Raymond Hettinger raymond.hettinger@gmail.comwrote:
On Dec 13, 2012, at 7:00 PM, Chris Jerdonek chris.jerdonek@gmail.com wrote:
On Thu, Dec 13, 2012 at 6:48 PM, R. David Murray rdmurray@bitdance.com
wrote:
On Thu, 13 Dec 2012 20:21:24 -0500, Trent Nelson trent@snakebite.org
wrote:
- Use a completely separate clone to house all the intermediate commits, then generate a diff once the final commit is ready, then apply that diff to the main cpython repo, then push that. This approach is fine, but it seems counter-intuitive to the whole concept of DVCS.
Perhaps. But that's exactly what I did with the email package changes for 3.3.
You seem to have a tension between "all those dirty little commits" and "clean history" and the fact that a dvcs is designed to preserve all those commits...if you don't want those intermediate commits in the official repo, then why is a diff/patch a bad way to achieve that?
Right. And you usually have to do this beforehand anyways to upload your changes to the tracker for review.
Also, for the record (not that anyone has said anything to the contrary), our dev guide says, "You should collapse changesets of a single feature or bugfix before pushing the result to the main repository. The reason is that we don’t want the history to be full of intermediate commits recording the private history of the person working on a patch. If you are using the rebase extension, consider adding the --collapse option to hg rebase. The collapse extension is another choice."
(from
http://docs.python.org/devguide/committing.html#working-with-mercurial )
Does hg's ability to "make merges easier than svn" depend on having all the intermediate commits? I thought the theory was that the smaller changesets provided extra information that made it possible to merge two expansive groups of changes.
Raymond _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/timothy.c.delaney%40gmail....

Raymond Hettinger writes:
Does hg's ability to "make merges easier than svn" depend on having all the intermediate commits? I thought the theory was that the smaller changesets provided extra information that made it possible to merge two expansive groups of changes.
Tim Delaney's explanation is correct as far as it goes.
But I would give a pretty firm "No" as the answer to your question.
The big difference between svn (and CVS) and hg (and git and bzr) at the time of migrating the Python repository was that svn didn't track merges, only branches. So in svn you get a 3-way merge with the branch point as the base version. This meant that you could not track progress of the mainline while working on a branch. svn tends to report the merge of recent mainline changes back into the mainline as conflicts when merging your branch into the mainline[1][2], all too often resulting in a big mess.
hg, because it records merges as well as branches, can use the most recent common version (typically the mainline parent of the most recent "catch-up" merge) as the base version. This means that (1) there are somewhat fewer divergences because your branch already contains most changes to the mainline, and (2) you don't get "spurious" conflicts. On the other hand, more frequent intermediate committing is mostly helpful in bisection, and so the usefulness depends on very disciplined committing (only commit build- and test-able code).
Summary: only the frequency of intermediate merge commits really matters. Because in hg it's possible to have frequent "catch-up" merges from mainline, you get smaller merges with fewer conflicts both at "catch-up" time and at merge-to-mainline time.
Footnotes: [1] Not the whole story, but OK for this purpose. Technical details available on request.
[2] I have paid almost no attention to svn since Python migrated to hg, so perhaps svn has improved merge support in the meantime. But that doesn't really matter since svn is merely being used to help explain why commit granularity doesn't matter much to hg's merge capabilities.

In article 20121214024824.3BCCC2500B2@webabinitio.net, "R. David Murray" rdmurray@bitdance.com wrote:
On Thu, 13 Dec 2012 20:21:24 -0500, Trent Nelson trent@snakebite.org wrote:
- Use a completely separate clone to house all the intermediate commits, then generate a diff once the final commit is ready, then apply that diff to the main cpython repo, then push that. This approach is fine, but it seems counter-intuitive to the whole concept of DVCS.
Perhaps. But that's exactly what I did with the email package changes for 3.3.
You seem to have a tension between "all those dirty little commits" and "clean history" and the fact that a dvcs is designed to preserve all those commits...if you don't want those intermediate commits in the official repo, then why is a diff/patch a bad way to achieve that? If you keep your pulls up to date in your feature repo, the diff/patch process is simple and smooth.
Also, if you prefer to go the patch route, hg provides the mq extension (inspired by quilt) to simplify managing patches including version controlling the patches. I find it much easy to deal that way with maintenance changes that may have a non-trivial gestation period.

R. David Murray writes:
those commits...if you don't want those intermediate commits in the official repo, then why is a diff/patch a bad way to achieve that?
Because a decent VCS provides TOOWTDI. And sometimes there are different degrees of "intermediate", or pehaps you even want to slice, dice, and mince the patches at the hunk level. Presenting the logic of the change often is best done in pieces but in an ahistorical way, but debugging often benefits from the context of an exact sequential history.
That said, diff/patch across repos is not per se evil, and may be easier for users to visualize than the results of the DAG transformations (such as rebase) provided by existing dVCSes.

Le Thu, 13 Dec 2012 21:48:23 -0500, "R. David Murray" rdmurray@bitdance.com a écrit :
On Thu, 13 Dec 2012 20:21:24 -0500, Trent Nelson trent@snakebite.org wrote:
- Use a completely separate clone to house all the
intermediate commits, then generate a diff once the final commit is ready, then apply that diff to the main cpython repo, then push that. This approach is fine, but it seems counter-intuitive to the whole concept of DVCS.
Perhaps. But that's exactly what I did with the email package changes for 3.3.
You seem to have a tension between "all those dirty little commits" and "clean history" and the fact that a dvcs is designed to preserve all those commits...if you don't want those intermediate commits in the official repo, then why is a diff/patch a bad way to achieve that? If you keep your pulls up to date in your feature repo, the diff/patch process is simple and smooth.
+1. We definitely don't want tons of small incremental commits in the official repo. "One changeset == one issue" should be the ideal horizon when committing changes.
Regards
Antoine.
participants (11)
-
Antoine Pitrou
-
Barry Warsaw
-
Chris Jerdonek
-
Larry Hastings
-
Ned Deily
-
Nick Coghlan
-
R. David Murray
-
Raymond Hettinger
-
Stephen J. Turnbull
-
Tim Delaney
-
Trent Nelson