Re: [Python-ideas] Using only patches for pulling changes in hg.python.org

On Sun, Jul 4, 2010 at 3:23 PM, Dirkjan Ochtman <dirkjan@ochtman.nl> wrote: ...
Hmm, I don't think I agree on what you're saying.
First, a changeset is a changeset is a changeset. If you exchange them as patches or in some other way (by pulling or pushing or whatever) shouldn't really matter. This is one of the things DVCS is good at, you can move csets around different clones in many ways, and all clones are created equal.
As far as I have experienced, there's a back and forth game between the contributor and the commiter, leading to clone hell, unless the contributor uses mq, then apply it in his clone right before the contributor pulls the changes in. Using patches makes changes separated from the history for the time being, until they are merged. And are easy to read, review and understand.
Second, a noisy history is never good. So yes, pulling some kind of messy history and pushing it to a central repo as-is is not a good idea. People should polish their changesets so that each changeset can stand on its own.
When you work on a feature, how do you polish a changeset without polluting your history or doing many clones ? (true question, I've been looking for that)
So yes, somewhere between it being a messy history of actual development and it going into a central repo, it should be cleaned up. Ideally, the original author should do that, but if he's not in a position to do so, the committer should do it.
Do you have an easy way to perform this cleanup ? Could you propose a process here ? I am bit skeptical that contributors will do this, whereas a patch policy makes sure we don't have to deal with this, and avoid asking people to have a high mercurial-fu. I am also skeptical that contributors are willing to digg into a clone to get what they want and/or check that it's fine to pull. It seems to me that patches are the universal format to propose a change and are easy to produce and consume. Contributors can use any process they want to create it, even without using mercurial.
Third, if the result of cleaning up is a single cset, it should probably be rebased before getting pushed to a central repo. If it's two or three csets, rebase it. On the other hand, if it's 10 csets, actually doing an explicit merge makes sense. The idea is not to clutter up the history with a merge every other cset, but if the merge is hard/non-trivial it can make sense to leave it explicit.
Fourth, one-patch-per-issue is too restrictive. Small commits are useful because they're way easier to review. Concatenate several small commits leading up to a single issue fix into a single patch and it gets much harder to read. Easy reviews are important, because a lot of valuable time is spent reviewing. The simple example is a chain like refactor-refactor-fix (which is IME quite common). Ideally each stage keeps the test suite passing and is internally consistent, but moving towards a common goal (to fix the issue).
So, I find your proposed policy somewhat vague and also not that attractive. Cleaning up the history is certainly a good thing, but I don't think we have to mandate a way for things to get into the repo. Mandating the use of issues as a reference for each fix or enhancement could be useful, but seems unnecessary.
Yes, it's vague, I don't have a clear idea yet and I am not that experienced in hg latest features, so I am probably doing some steps wrong or ignore some shortcuts. But I have the strong feeling that without patches, we are heading for extra work for all parties unless we have a strong tutorial on how to contribute with hg.python.org, and that is proven to be very simple. side note: I am replying to the gname emails but I don't know if this works with the mailman thread as well.. Tarek

On 07/04/2010 03:53 PM, Tarek Ziadé wrote:
On Sun, Jul 4, 2010 at 3:23 PM, Dirkjan Ochtman <dirkjan@ochtman.nl> wrote: ...
Hmm, I don't think I agree on what you're saying.
First, a changeset is a changeset is a changeset. If you exchange them as patches or in some other way (by pulling or pushing or whatever) shouldn't really matter. This is one of the things DVCS is good at, you can move csets around different clones in many ways, and all clones are created equal.
As far as I have experienced, there's a back and forth game between the contributor and the commiter, leading to clone hell, unless the contributor uses mq, then apply it in his clone right before the contributor pulls the changes in.
Using patches makes changes separated from the history for the time being, until they are merged. And are easy to read, review and understand.
Second, a noisy history is never good. So yes, pulling some kind of messy history and pushing it to a central repo as-is is not a good idea. People should polish their changesets so that each changeset can stand on its own.
When you work on a feature, how do you polish a changeset without polluting your history or doing many clones ? (true question, I've been looking for that)
mq is a good method. If a changeset that only exists locally has to be changed, you can convert it to a patch, make some changes, and re-commit. If the changesets are relatively clean in the first place, you can rebase/transplant/strip your way around too big a mess.
So yes, somewhere between it being a messy history of actual development and it going into a central repo, it should be cleaned up. Ideally, the original author should do that, but if he's not in a position to do so, the committer should do it.
Do you have an easy way to perform this cleanup ? Could you propose a process here ?
I am bit skeptical that contributors will do this, whereas a patch policy makes sure we don't have to deal with this, and avoid asking people to have a high mercurial-fu. I am also skeptical that contributors are willing to digg into a clone to get what they want and/or check that it's fine to pull.
It seems to me that patches are the universal format to propose a change and are easy to produce and consume. Contributors can use any process they want to create it, even without using mercurial.
There's no reason to force those of us capable of producing clean hg branches back into the world of patches. I can see why you'd want to be able to say "no, this repo is a mess. Submit something presentable, like a patch." Some "how to contribute" document might recommend using mq, but it shouldn't be a requirement - pulling comes naturally with DVCS. Python should use it. Accept patches - sure - not everyone uses mercurial. Require patches - please don't!
Third, if the result of cleaning up is a single cset, it should probably be rebased before getting pushed to a central repo. If it's two or three csets, rebase it. On the other hand, if it's 10 csets, actually doing an explicit merge makes sense. The idea is not to clutter up the history with a merge every other cset, but if the merge is hard/non-trivial it can make sense to leave it explicit.
Fourth, one-patch-per-issue is too restrictive. Small commits are useful because they're way easier to review. Concatenate several small commits leading up to a single issue fix into a single patch and it gets much harder to read. Easy reviews are important, because a lot of valuable time is spent reviewing. The simple example is a chain like refactor-refactor-fix (which is IME quite common). Ideally each stage keeps the test suite passing and is internally consistent, but moving towards a common goal (to fix the issue).
So, I find your proposed policy somewhat vague and also not that attractive. Cleaning up the history is certainly a good thing, but I don't think we have to mandate a way for things to get into the repo. Mandating the use of issues as a reference for each fix or enhancement could be useful, but seems unnecessary.
Yes, it's vague, I don't have a clear idea yet and I am not that experienced in hg latest features, so I am probably doing some steps wrong or ignore some shortcuts.
But I have the strong feeling that without patches, we are heading for extra work for all parties unless we have a strong tutorial on how to contribute with hg.python.org, and that is proven to be very simple.
side note: I am replying to the gname emails but I don't know if this works with the mailman thread as well..
Tarek
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Thomas Jollans wrote:
On 07/04/2010 03:53 PM, Tarek Ziadé wrote:
Do you have an easy way to perform this cleanup ? Could you propose a process here ?
I am bit skeptical that contributors will do this, whereas a patch policy makes sure we don't have to deal with this, and avoid asking people to have a high mercurial-fu. I am also skeptical that contributors are willing to digg into a clone to get what they want and/or check that it's fine to pull.
It seems to me that patches are the universal format to propose a change and are easy to produce and consume. Contributors can use any process they want to create it, even without using mercurial.
There's no reason to force those of us capable of producing clean hg branches back into the world of patches. I can see why you'd want to be able to say "no, this repo is a mess. Submit something presentable, like a patch." Some "how to contribute" document might recommend using mq, but it shouldn't be a requirement - pulling comes naturally with DVCS. Python should use it.
Accept patches - sure - not everyone uses mercurial. Require patches - please don't!
I'm with Tarek here: the only way for core developers to be able to review checkins on the checkins list is by looking at the patches that go in. Having to look at 10+ checkin emails for a single "patch" will break this review process - no-one will be able to follow what a particular pulled set of changes will do in the end, compared to what we had in the repo before the pull. As a result, the review process will no longer be possible. As example, see Tarek's pull/push of the distutils2 work. Those checkin email will just rush by and not get a second or third review. OTOH, I don't think that requiring to open a ticket on the tracker for everything is needed either. Aside 1: Isn't it interesting that the more we actually think about moving to Mercurial, the more we find that the existing Subversion model of working is actually a very workable model for a large open source project ?! Aside 2: This thread should really be moved to python-dev where it belongs. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 05 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 13 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 07/05/2010 09:30 AM, M.-A. Lemburg wrote:
Thomas Jollans wrote:
On 07/04/2010 03:53 PM, Tarek Ziadé wrote:
Do you have an easy way to perform this cleanup ? Could you propose a process here ?
I am bit skeptical that contributors will do this, whereas a patch policy makes sure we don't have to deal with this, and avoid asking people to have a high mercurial-fu. I am also skeptical that contributors are willing to digg into a clone to get what they want and/or check that it's fine to pull.
It seems to me that patches are the universal format to propose a change and are easy to produce and consume. Contributors can use any process they want to create it, even without using mercurial.
There's no reason to force those of us capable of producing clean hg branches back into the world of patches. I can see why you'd want to be able to say "no, this repo is a mess. Submit something presentable, like a patch." Some "how to contribute" document might recommend using mq, but it shouldn't be a requirement - pulling comes naturally with DVCS. Python should use it.
Accept patches - sure - not everyone uses mercurial. Require patches - please don't!
I'm with Tarek here: the only way for core developers to be able to review checkins on the checkins list is by looking at the patches that go in.
Having to look at 10+ checkin emails for a single "patch" will break this review process - no-one will be able to follow what a particular pulled set of changes will do in the end, compared to what we had in the repo before the pull. As a result, the review process will no longer be possible.
If the problem is the amount of changesets per "patch", then it has to be the responsibility of the person committing - be that a core dev or an external contributor - to make sure it's only a single changeset. OTOH, I don't think being that strict about it is a good idea - in many cases, having a handful of changesets is, IMHO, better, with Mercurial. Either way, if there is some sort of policy stating how changes should look, I for one would be happy to publish a branch on bitbucket or my own hgweb instance in that format. Permitting text patches is a must, but requiring text patches when we have actual distributed branching is quite the anachronism.
As example, see Tarek's pull/push of the distutils2 work. Those checkin email will just rush by and not get a second or third review.
If the problem is the amount of emails per "patch" then, for god's sake, change the script that writes the emails to send a mail per push, instead of a mail per commit ! DVCSs allow one to have small, atomic (but, of course, inter-dependent) commits, and push them later. I myself feel that this property should be valued, not feared.
OTOH, I don't think that requiring to open a ticket on the tracker for everything is needed either.
Aside 1: Isn't it interesting that the more we actually think about moving to Mercurial, the more we find that the existing Subversion model of working is actually a very workable model for a large open source project ?!
It's all a question of how changes are reviewed and synchronised. Of course, the Python subversion model works, no question. The specific approach "turn every commit into an email for proof reading" appears to work well with it. It may not work as well with hg. It may work better if you s/commit/push/ instead of s/commit/changeset/. Other projects work in a more distributed fashion, with developers' private repositories, changes being reviewed before pulling/merging. Linux is, of course, a prominent example. If this approach is for Python, I don't know. I doubt it, at least for the time being. But a suitable workflow will surely be found. Ah well, we'll see what happens. Thomas

Thomas Jollans wrote:
On 07/05/2010 09:30 AM, M.-A. Lemburg wrote:
Thomas Jollans wrote:
On 07/04/2010 03:53 PM, Tarek Ziadé wrote:
Do you have an easy way to perform this cleanup ? Could you propose a process here ?
I am bit skeptical that contributors will do this, whereas a patch policy makes sure we don't have to deal with this, and avoid asking people to have a high mercurial-fu. I am also skeptical that contributors are willing to digg into a clone to get what they want and/or check that it's fine to pull.
It seems to me that patches are the universal format to propose a change and are easy to produce and consume. Contributors can use any process they want to create it, even without using mercurial.
There's no reason to force those of us capable of producing clean hg branches back into the world of patches. I can see why you'd want to be able to say "no, this repo is a mess. Submit something presentable, like a patch." Some "how to contribute" document might recommend using mq, but it shouldn't be a requirement - pulling comes naturally with DVCS. Python should use it.
Accept patches - sure - not everyone uses mercurial. Require patches - please don't!
I'm with Tarek here: the only way for core developers to be able to review checkins on the checkins list is by looking at the patches that go in.
Having to look at 10+ checkin emails for a single "patch" will break this review process - no-one will be able to follow what a particular pulled set of changes will do in the end, compared to what we had in the repo before the pull. As a result, the review process will no longer be possible.
If the problem is the amount of changesets per "patch", then it has to be the responsibility of the person committing - be that a core dev or an external contributor - to make sure it's only a single changeset. OTOH, I don't think being that strict about it is a good idea - in many cases, having a handful of changesets is, IMHO, better, with Mercurial.
Either way, if there is some sort of policy stating how changes should look, I for one would be happy to publish a branch on bitbucket or my own hgweb instance in that format. Permitting text patches is a must, but requiring text patches when we have actual distributed branching is quite the anachronism.
You need those patches anyway, since that's how we review things on the issue tracker. The point I wanted to make was that (at least some of) the core devs do monitor the checkins list for new code and/or changes to existing code going in. This would not longer reasonably work, if you start pushing revisions of patches down the list as well. The history of those patches is not all that interesting to Python developers. It's the final outcome, that makes the difference.
As example, see Tarek's pull/push of the distutils2 work. Those checkin email will just rush by and not get a second or third review.
If the problem is the amount of emails per "patch" then, for god's sake, change the script that writes the emails to send a mail per push, instead of a mail per commit !
DVCSs allow one to have small, atomic (but, of course, inter-dependent) commits, and push them later. I myself feel that this property should be valued, not feared.
This is not a matter of receiving the patch in 10+ emails, or lumping everything into one email. I simply don't see any benefit in having to follow the path of development of a patch. Much to the contrary: it only adds noise that distracts from the important bits. The discussion of a patch is recorded on the issue tracker anyway and in a form that is more easily comprehensible than a set of checkin messages.
OTOH, I don't think that requiring to open a ticket on the tracker for everything is needed either.
Aside 1: Isn't it interesting that the more we actually think about moving to Mercurial, the more we find that the existing Subversion model of working is actually a very workable model for a large open source project ?!
It's all a question of how changes are reviewed and synchronised. Of course, the Python subversion model works, no question. The specific approach "turn every commit into an email for proof reading" appears to work well with it. It may not work as well with hg. It may work better if you s/commit/push/ instead of s/commit/changeset/. Other projects work in a more distributed fashion, with developers' private repositories, changes being reviewed before pulling/merging. Linux is, of course, a prominent example. If this approach is for Python, I don't know. I doubt it, at least for the time being. But a suitable workflow will surely be found.
Ah well, we'll see what happens.
Certainly. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 05 2010)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
2010-07-19: EuroPython 2010, Birmingham, UK 13 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 5 July 2010 11:11, M.-A. Lemburg <mal@egenix.com> wrote:
The point I wanted to make was that (at least some of) the core devs do monitor the checkins list for new code and/or changes to existing code going in. This would not longer reasonably work, if you start pushing revisions of patches down the list as well.
I agree entirely that commits should be made up of "completed" patches, not of "work in progress" (patch 2 fixing a badly named variable in patch 1, etc). But there may be merit in breaking a large patch into a series of self-contained, incremental changes - which *can* be reviewed independently, but which make sense as a group. For example, one patch that introduces set literals, a second which updates the standard library code to use them. As a more radical possibility, a patch could be broken up into 3, one with the code changes, one with the tests and one with the documentation. That may be less acceptable, although it does allow for the possibility of someone with little C experience to contribute by reviewing the docs and tests without having to worry about the code. Ultimately, it's for the core devs to decide, though. Paul.

On 7/5/2010 8:20 AM, Paul Moore wrote:
On 5 July 2010 11:11, M.-A. Lemburg<mal@egenix.com> wrote:
The point I wanted to make was that (at least some of) the core devs do monitor the checkins list for new code and/or changes to existing code going in. This would not longer reasonably work, if you start pushing revisions of patches down the list as well.
I agree entirely that commits should be made up of "completed" patches, not of "work in progress" (patch 2 fixing a badly named variable in patch 1, etc).
But there may be merit in breaking a large patch into a series of self-contained, incremental changes - which *can* be reviewed independently, but which make sense as a group. For example, one patch that introduces set literals, a second which updates the standard library code to use them.
Devs have occasionally asked a submitter of a large patch to split it into reviewable pieces. But that should be a special-case decision of a commiter reviewer.
As a more radical possibility, a patch could be broken up into 3, one with the code changes, one with the tests and one with the documentation. That may be less acceptable, although it does allow for the possibility of someone with little C experience to contribute by reviewing the docs and tests without having to worry about the code.
I do not see that as being so useful. Patches have section for each file and I have no trouble not reading a file section. Part of review is checking that doc and code changes match. Also, test and code patch have to be applied together. -- Terry Jan Reedy

On 2010-07-05 12:11, M.-A. Lemburg wrote:
The point I wanted to make was that (at least some of) the core devs do monitor the checkins list for new code and/or changes to existing code going in. This would not longer reasonably work, if you start pushing revisions of patches down the list as well.
That was not what I meant at all. You don't send different patch revisions, or incremental improvements to a single change into a single repository. You send in chunks of changes that can stand on their own (for example in the test suite), instead of a single large patch that's much harder to review, which contains everything needed to fix a single issue. Cheers, Dirkjan

On Mon, Jul 5, 2010 at 00:30, M.-A. Lemburg <mal@egenix.com> wrote: [SNIP]
Aside 1: Isn't it interesting that the more we actually think about moving to Mercurial, the more we find that the existing Subversion model of working is actually a very workable model for a large open source project ?!
Not really. The current system works and is understood without retraining. The switch to hg has never been about tweaking the workflow of committers, but that of contributors.

On Tue, Jul 6, 2010 at 6:11 AM, Brett Cannon <brett@python.org> wrote:
On Mon, Jul 5, 2010 at 00:30, M.-A. Lemburg <mal@egenix.com> wrote: [SNIP]
Aside 1: Isn't it interesting that the more we actually think about moving to Mercurial, the more we find that the existing Subversion model of working is actually a very workable model for a large open source project ?!
Not really. The current system works and is understood without retraining. The switch to hg has never been about tweaking the workflow of committers, but that of contributors.
Although, as with the CVS to SVN transmissions, the workflows of committers will likely change over time as we become more adept at exploiting the more powerful tool. I liked Joel Spolsky's observation that in moving from a centralised VCS to a distributed VCS, the key idea to wrap your head around is the shift from managing file (and repository) revisions to coherent changesets. I suspect that's something that can only happen properly by using a DVCS for a while. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, 6 Jul 2010 06:41:19 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
Although, as with the CVS to SVN transmissions, the workflows of committers will likely change over time as we become more adept at exploiting the more powerful tool.
I liked Joel Spolsky's observation that in moving from a centralised VCS to a distributed VCS, the key idea to wrap your head around is the shift from managing file (and repository) revisions to coherent changesets.
I suspect Spolsky has skipped on SVN then, because SVN already allows for coherent changesets (that's how we use it most of the time anyway). Regards Antoine.

On Tue, Jul 6, 2010 at 10:15 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Tue, 6 Jul 2010 06:41:19 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
Although, as with the CVS to SVN transmissions, the workflows of committers will likely change over time as we become more adept at exploiting the more powerful tool.
I liked Joel Spolsky's observation that in moving from a centralised VCS to a distributed VCS, the key idea to wrap your head around is the shift from managing file (and repository) revisions to coherent changesets.
I suspect Spolsky has skipped on SVN then, because SVN already allows for coherent changesets (that's how we use it most of the time anyway).
No it doesn't. It has atomic commits (as do many other centralised version control systems), but it still only manages file revisions. The mental conversion Spolsky was talking about was specifically from SVN to Hg, the same one we're looking at. A DVCS isn't written in terms of file revisions the way SVN is, it's written in terms of a directed acyclic graph of changesets. If anyone wants to see what he actually wrote, rather than my hacked up paraphrase of it, it's the last programming article he did for Joel on Software: http://www.joelonsoftware.com/items/2010/03/17.html Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
The mental conversion Spolsky was talking about was specifically from SVN to Hg, the same one we're looking at. A DVCS isn't written in terms of file revisions the way SVN is, it's written in terms of a directed acyclic graph of changesets.
Sure. But I think Antoine's right. So is the Python workflow. At any given time, you've got dozens of patches in active development in people's workspaces and on the tracker. As they get baked, you pull in a coherent set and commit it. Here's what Joel says: In Subversion, you might think, "bring my version up to date with the main version" or "go back to the previous version." In Mercurial, you think, "get me Jacob's change set" or "let's just forget that change set." While it's certainly true that to work with Python's Subversion repo you need to translate to terms of a fairly linear progression of versions, I don't see people thinking that way about the workflow. I think people do expect commits to the svn repo to be coherent, and by and large they are. I personally expect this migration to make a big difference to the core committers, because it gives them that much more flexibility. Casual committers and pull-only tester types may have some trouble adjusting, but I really don't think it will be that bad.

On Mon, Jul 5, 2010 at 3:11 PM, Brett Cannon <brett@python.org> wrote:
The switch to hg has never been about tweaking the workflow of committers, but that of contributors.
I've always thought of it as tweaking the workflow of collaboration. As an individual contributor and non-committer, the server switch isn't going to impact my workflow much. I use a DVCS locally to manage my work and then I submit a patch on the bug tracker. After the server switch, I'll do the same. A DVCS server will help a lot when I'm collaborating on a patch with others. As a concrete example, a few months ago I wrote a patch to speed up math.factorial (issue8692). Alexander Belopolsky and Mark Dickinson found a few corner-case flaws, suggested code-cleanup improvements, and some algorithmic alternatives. We went back and forth with several variations of patches before settling on a final patch. When Python is natively hosted in Mercurial, then the tools can explicitly track the relationship between all of the experimental patches. When just pushing patch files around, it's pretty hard to see that factorial-precompute-partials.patch is based on factorial-no-recursion.patch if you haven't been following the issue closely. It's also hard to examine the incremental changes between the two, which makes it hard to review an updated patch after reviewing the original. All of that would be a lot easier if I had started my patch as a clone of py3k on bitbucket. At the end of the process, the final committer can consolidate the changes into a single patch to keep the core repository clean. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
participants (11)
-
Antoine Pitrou
-
Brett Cannon
-
Daniel Stutzbach
-
Dirkjan Ochtman
-
M.-A. Lemburg
-
Nick Coghlan
-
Paul Moore
-
Stephen J. Turnbull
-
Tarek Ziadé
-
Terry Reedy
-
Thomas Jollans