Using only patches for pulling changes in hg.python.org

Hello, If you follow python-checkins, you have probably notice and got annoyed this morning my +100 checkin mails in distutils2. I was lagging a bit on getting the GSOC students work pulled in, and, with the DVCS effect, you get 2/3 weeks of work in hg.python.org in a minute. :) Once CPython itself is in mercurial, we will probably have the same problem when people are pulling contributions. If you use a "hg pull" command it will get all commits from the third party, even if some if those commits are unnecessary noise, like "I have removed this file. OOps I am putting the file back in..". And it's not so easy to edit the incoming changelog once they are commited. It's not easy either to use "hg incoming" because most of the time, the third party clone has many unrelated changes. I think we should work with queues and patches everywhere to solve this. The idea is to have contributors handling hg patches in bug.python.org, one patch per feature. They can use mq for that, and the benefit will be to have a very clean history in all repositories. A good thing about hg patches is that unlike simple diffs, the contributor name and comment appears in the final changelog. I would like to propose a policy for hg.python.org, based on mercurial queues + bugs.python.org, and I would like to contribute a small guide about it in python.org/dev. Regards Tarek -- Tarek Ziadé | http://ziade.org

On Sun, 4 Jul 2010 12:47:03 +0200 Tarek Ziadé <ziade.tarek@gmail.com> wrote:
I would like to propose a policy for hg.python.org, based on mercurial queues + bugs.python.org, and I would like to contribute a small guide about it in python.org/dev.
Sounds good. We can probably make mq optional, since regular diffs would work as well (except that they wouldn't contain the original committer name, but that isn't different from what we have today). Regards Antoine.

[Antoine Pitrou]
[Tarek Ziadé]
I would like to propose a policy for hg.python.org, based on mercurial queues + bugs.python.org, and I would like to contribute a small guide about it in python.org/dev.
Sounds good. We can probably make mq optional, since regular diffs would work as well (except that they wouldn't contain the original committer name, but that isn't different from what we have today).
Agreed. The policy can just require patches and thus let people choose their local workflow (many commits in a named branch/bookmark/pbranch and then diff, MQ for moar power, or just edit things to get a diff without using a fancy command (like now)). The policy should say something about authorship attribution. Mercurial-made diffs contain the user name in a special comment which is used by hg import, plain diffs can be applied with patch and then committed with “hg commit --user "Bill <bill@example.org>"“, and if a patch is edited before commit, then use the current style (core dev as committer, original patch author in the commit message). First-class authorship acknowledgment is a nice feature of DVCSes. The policy should also allow pulling from another repo if it contains changesets that aren’t crufty. In that case, a pusher (new name for a svn committer) can just pull from Bill and push to the main repo, adding extra export-to-patch import-from-patch steps is unnecessary. Cheers

On 2010-07-04 12:47, Tarek Ziadé wrote:
Once CPython itself is in mercurial, we will probably have the same problem when people are pulling contributions. If you use a "hg pull" command it will get all commits from the third party, even if some if those commits are unnecessary noise, like "I have removed this file. OOps I am putting the file back in..".
And it's not so easy to edit the incoming changelog once they are commited. It's not easy either to use "hg incoming" because most of the time, the third party clone has many unrelated changes. I think we should work with queues and patches everywhere to solve this.
The idea is to have contributors handling hg patches in bug.python.org, one patch per feature. They can use mq for that, and the benefit will be to have a very clean history in all repositories. A good thing about hg patches is that unlike simple diffs, the contributor name and comment appears in the final changelog.
Hmm, I don't think I agree on what you're saying. First, a changeset is a changeset is a changeset. If you exchange them as patches or in some other way (by pulling or pushing or whatever) shouldn't really matter. This is one of the things DVCS is good at, you can move csets around different clones in many ways, and all clones are created equal. Second, a noisy history is never good. So yes, pulling some kind of messy history and pushing it to a central repo as-is is not a good idea. People should polish their changesets so that each changeset can stand on its own. So yes, somewhere between it being a messy history of actual development and it going into a central repo, it should be cleaned up. Ideally, the original author should do that, but if he's not in a position to do so, the committer should do it. Third, if the result of cleaning up is a single cset, it should probably be rebased before getting pushed to a central repo. If it's two or three csets, rebase it. On the other hand, if it's 10 csets, actually doing an explicit merge makes sense. The idea is not to clutter up the history with a merge every other cset, but if the merge is hard/non-trivial it can make sense to leave it explicit. Fourth, one-patch-per-issue is too restrictive. Small commits are useful because they're way easier to review. Concatenate several small commits leading up to a single issue fix into a single patch and it gets much harder to read. Easy reviews are important, because a lot of valuable time is spent reviewing. The simple example is a chain like refactor-refactor-fix (which is IME quite common). Ideally each stage keeps the test suite passing and is internally consistent, but moving towards a common goal (to fix the issue). So, I find your proposed policy somewhat vague and also not that attractive. Cleaning up the history is certainly a good thing, but I don't think we have to mandate a way for things to get into the repo. Mandating the use of issues as a reference for each fix or enhancement could be useful, but seems unnecessary. Cheers, Dirkjan

On Sun, 04 Jul 2010 15:46:53 +0200 Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
Fourth, one-patch-per-issue is too restrictive. Small commits are useful because they're way easier to review. Concatenate several small commits leading up to a single issue fix into a single patch and it gets much harder to read.
I don't agree with that. The commits obviously won't be independent because they will be motivated by each other (or even dependent on each other), therefore you have to remember what the other commits do when reviewing one of them. What's more, when reading "hg log" months or years later, it is hard to make sense of a single commit because you don't really know what issue it was meant to contribute to fix. I know that's how Mercurial devs do things, but I don't really like it. Regards Antoine.

Am 04.07.2010 17:26, schrieb Antoine Pitrou:
On Sun, 04 Jul 2010 15:46:53 +0200 Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
Fourth, one-patch-per-issue is too restrictive. Small commits are useful because they're way easier to review. Concatenate several small commits leading up to a single issue fix into a single patch and it gets much harder to read.
I don't agree with that. The commits obviously won't be independent because they will be motivated by each other (or even dependent on each other), therefore you have to remember what the other commits do when reviewing one of them. What's more, when reading "hg log" months or years later, it is hard to make sense of a single commit because you don't really know what issue it was meant to contribute to fix.
I know that's how Mercurial devs do things, but I don't really like it.
I think the best of both worlds is to encourage contributors to send more complicated patches in a series of easy-to-review steps, but when committing to Python, make one changeset out of them. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

On Sun, Jul 4, 2010 at 6:56 PM, Georg Brandl <g.brandl@gmx.net> wrote:
Am 04.07.2010 17:26, schrieb Antoine Pitrou:
On Sun, 04 Jul 2010 15:46:53 +0200 Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
Fourth, one-patch-per-issue is too restrictive. Small commits are useful because they're way easier to review. Concatenate several small commits leading up to a single issue fix into a single patch and it gets much harder to read.
I don't agree with that. The commits obviously won't be independent because they will be motivated by each other (or even dependent on each other), therefore you have to remember what the other commits do when reviewing one of them. What's more, when reading "hg log" months or years later, it is hard to make sense of a single commit because you don't really know what issue it was meant to contribute to fix.
I know that's how Mercurial devs do things, but I don't really like it.
I think the best of both worlds is to encourage contributors to send more complicated patches in a series of easy-to-review steps, but when committing to Python, make one changeset out of them.
Exactly, so one bugfix or one feature comes in a single changeset that contains ideally the code change + the doc change + the tests. Like Thomas has suggested, I'll start a "how to contribute" wiki page with the best practices, and give the url here so everyone can contribute/correct it. Tarek -- Tarek Ziadé | http://ziade.org

Georg Brandl writes:
Am 04.07.2010 17:26, schrieb Antoine Pitrou:
On Sun, 04 Jul 2010 15:46:53 +0200 Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
Fourth, one-patch-per-issue is too restrictive. Small commits are useful because they're way easier to review. Concatenate several small commits leading up to a single issue fix into a single patch and it gets much harder to read.
I don't agree with that. The commits obviously won't be independent because they will be motivated by each other (or even dependent on each other), therefore you have to remember what the other commits do when reviewing one of them. What's more, when reading "hg log" months or years later, it is hard to make sense of a single commit because you don't really know what issue it was meant to contribute to fix.
I know that's how Mercurial devs do things, but I don't really like it.
I think the best of both worlds is to encourage contributors to send more complicated patches in a series of easy-to-review steps, but when committing to Python, make one changeset out of them.
I don't see how this addresses Antoine's problem of connecting commits to issues at all. Some ways to address it are (1) require issue numbers in log messages, if there is an applicable issue (for non-committers, there should be a patch issue on the tracker, right?) and (2) require that the commits addressing a single issue be done on a single separate branch, then merged (which doesn't connect issues to commits, but does connect a series of commits). I don't really see why commits should take place in a lump, either. That makes bisecting less accurate, for one thing. Nor does it help with review; the review is already done by the time the commit takes place, no? OTOH, people who have a specific interest and want to review ex post are often going to want the bite-size patches, just as the original reviewer did, no?

Am 06.07.2010 07:16, schrieb Stephen J. Turnbull:
Georg Brandl writes:
Am 04.07.2010 17:26, schrieb Antoine Pitrou:
On Sun, 04 Jul 2010 15:46:53 +0200 Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
Fourth, one-patch-per-issue is too restrictive. Small commits are useful because they're way easier to review. Concatenate several small commits leading up to a single issue fix into a single patch and it gets much harder to read.
I don't agree with that. The commits obviously won't be independent because they will be motivated by each other (or even dependent on each other), therefore you have to remember what the other commits do when reviewing one of them. What's more, when reading "hg log" months or years later, it is hard to make sense of a single commit because you don't really know what issue it was meant to contribute to fix.
I know that's how Mercurial devs do things, but I don't really like it.
I think the best of both worlds is to encourage contributors to send more complicated patches in a series of easy-to-review steps, but when committing to Python, make one changeset out of them.
I don't see how this addresses Antoine's problem of connecting commits to issues at all.
I wasn't addressing Antoine's original problem, rather his reply to Dirkjan. Georg

Georg Brandl writes:
Am 06.07.2010 07:16, schrieb Stephen J. Turnbull:
Georg Brandl writes:
Am 04.07.2010 17:26, schrieb Antoine Pitrou:
On Sun, 04 Jul 2010 15:46:53 +0200 Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
Fourth, one-patch-per-issue is too restrictive. Small commits are useful because they're way easier to review. Concatenate several small commits leading up to a single issue fix into a single patch and it gets much harder to read.
I don't agree with that. The commits obviously won't be independent because they will be motivated by each other (or even dependent on each other), therefore you have to remember what the other commits do when reviewing one of them. What's more, when reading "hg log" months or years later, it is hard to make sense of a single commit because you don't really know what issue it was meant to contribute to fix.
I know that's how Mercurial devs do things, but I don't really like it.
I think the best of both worlds is to encourage contributors to send more complicated patches in a series of easy-to-review steps, but when committing to Python, make one changeset out of them.
I don't see how this addresses Antoine's problem of connecting commits to issues at all.
I wasn't addressing Antoine's original problem, rather his reply to Dirkjan.
Huh? Are you referring to something other than the part of his post that you quoted? Antoine writes "you have to remember what the other commits do when reviewing them" and "it is hard to make sense of a single commit [in a series] because you don't know what issue it was meant to fix". I admit I'm not really sure what his issue is. It seems to me that connecting commits is what a feature branch (in conjunction with rebase) is designed to achieve. If you don't like rebase, you can either work fast enough that your whole sequence is done before the mainline moves on significantly, or you can refrain from updating until done (and have a potentially messy merge), or you can use MQ (which is really just a way of rebasing without the shame ;-). I'll have to test it, but AFAIK in all of the above strategies, as long as you don't push to the public repo until done, the logs of the commits on the feature branch should all be adjacent in the natural order of hg log. That seems to me to be the optimal strategy, in combination with reading long parts of history in a graphical DAG browser. Of course, that assumes that random pieces of the fix aren't dispersed among commits. In that case the logs will still be hard to read and understand, as will the diffs. People who like to commit early and often should indeed be encouraged to edit their feature branches to make each individual commit make sense to reviewers. (MQ helps to address this, as does Bazaar's loom feature or StGit.) Feature branches don't automatically organize commits in an intelligible way, that requires an intelligence driving the process. But they do make it possible. Once you have feature branches, then there's a question of the external issue. Here reviewers should pay attention to the log message, and make sure it describes the problem well, and includes cross references to any documentation (tracker issue or ML thread). But that's no different from the current process. I think that in many cases the process of coming up with coherent changesets that are reviewable will indeed result in a single commit to address the whole issue. But there will also be multicommit patterns that make sense, such as "refactor API and update current clients -> use new feature in a few places". The thing to remember is that DVCSes not only record a frozen view of history accurately, but can also be used to flexibly reorganize the presentation of that history "as it should have happened". I think of these workflows as opportunities to *improve* the quality of information presented by the history. But they aren't mandated by adopting hg. Contributors and reviewers who are satisfied with the current process should continue to refine a set of changes to a single commit. hg is certainly flexible enough to allow that, with several different workflows. And Antoine's worries (AIUI) are not unfounded. Eg, we should not allow people to be lazy and submit a feature branch with changes randomly assigned to different commits and log messages like "Lunch time! commit progress to date." But that's a social problem; I think that conventions will quickly evolve *from* the one patch per issue workflow to a *well-organized* feature branch per issue (as appropriate) because python-dev reviewers will demand it.
participants (6)
-
Antoine Pitrou
-
Dirkjan Ochtman
-
Georg Brandl
-
Stephen J. Turnbull
-
Tarek Ziadé
-
Éric Araujo