Switching to a distributed version control system
I think that it would be a suitable point in time to switch to a new version control system right after the 1.1 release. The first question to ask is of course why we should switch at all. While the distributed version control systems allow a workflow where people maintan their own repositories and there is a designated role of integrator, I don't think we need such a workflow at this point in time. It may very well be the model to use in the future, when we have a production usable system, but right now this feature has no direct appeal. The compelling reason to switch is in my opinion the superior support for branches that the DVCS's provide. Creating a branch is a very cheap operation and merging it to the trunk or whatever branch is far superior to what SVN provides. I think this feature would change the way we are working and improve our productivity by a significant factor. There are a few other arguments in favour of a switch. People working through GPRS and off-line would have an easier time handling branches and updates. It would be possible to do sprints without a working internet access. There are, in my opinion, 3 viable choices of DVCS for PyPy: - git - hg (mercurial) - bzr I think they would all be an improvement over SVN and they all have their strengths and weaknesses. In favour of bzr and hg is the fact that they are written in Python, with core parts in C. Git is all C. Git currently requires a cygwin environment to run on Windows, hg and bzr appear to have native windows versions. Git is the fastest of the lot with hg in second place. Bzr is still a fair bit slower, though this is being worked on. Hg is really good at keeping the repositories small, with git in second place. Speaking for bzr is the fact that we have Michael Hudson in the PyPy community, and he seems to be a guru on bzr by now. Hg seems to be a little more tedious in its command set than the other two. Git used to be rather obscure, but is these days very straight forward to use. Git and bzr have very good visualization tools for showing the splitting and merging of branches. Git seems to be best at showing exactly what changed between 2 versions of the code (even 2 versions that are not on the same subtree). The strongest argument in favour of git seems to me to be the rebase feature, which allows one to make a branch for a new feature, work on the branch and then update the base of the branch to branch off at a later point in time. I haven't identified this feature in hg and bzr, but then I haven't read all the documentation in detail. The one feature of svn that we would miss is the inclusion of foreign version controlled trees, like we do with the pylib tree. We would have to do this in a different way than before, since none of the systems have this feature. I'm not sure it makes sense to have the close svn coupling between the projects any more, in any case. The effort of learning any of the systems seems to be quite insignificant. getting up to the level of svn is a matter of 15 minutes and learning the whole range of commands in a tool is not a big effort. There is of course the hooks that send mail and blurb in IRC, but all 3 systems seem to have at least as powerful hooks as svn. Jacob
2008/9/12 Jacob Hallén <jacob@openend.se>:
I think that it would be a suitable point in time to switch to a new version control system right after the 1.1 release.
The first question to ask is of course why we should switch at all.
While the distributed version control systems allow a workflow where people maintan their own repositories and there is a designated role of integrator, I don't think we need such a workflow at this point in time. It may very well be the model to use in the future, when we have a production usable system, but right now this feature has no direct appeal.
I would imagine there would still be a 'trunk' branch somewhere that would be where releases get made from and so on.
The compelling reason to switch is in my opinion the superior support for branches that the DVCS's provide. Creating a branch is a very cheap operation and merging it to the trunk or whatever branch is far superior to what SVN provides. I think this feature would change the way we are working and improve our productivity by a significant factor.
There are a few other arguments in favour of a switch. People working through GPRS and off-line would have an easier time handling branches and updates. It would be possible to do sprints without a working internet access.
There are, in my opinion, 3 viable choices of DVCS for PyPy:
- git
- hg (mercurial)
- bzr
I'd agree with this list, afaik, there's nothing else out there to take seriously at present. Darcs with it's cherry picking and lack of enforcing a linear history would be interesting, but I'd be a bit wary of using it seriously, tbh.
I think they would all be an improvement over SVN and they all have their strengths and weaknesses. In favour of bzr and hg is the fact that they are written in Python, with core parts in C. Git is all C. Git currently requires a cygwin environment to run on Windows, hg and bzr appear to have native windows versions. Git is the fastest of the lot with hg in second place. Bzr is still a fair bit slower, though this is being worked on.
I can say here that I work all day every day with bzr on a codebase (Launchpad) that's actually a pretty similar size to pypy -- 5000 files in 300 directories, a few tens of thousands of revisions -- and performance is mostly fine.
Hg is really good at keeping the repositories small, with git in second place.
It's worth mentioning how very bad svn is here: svn keeps a separate copy of every file in the working tree! I think that a copy of the tree containing the entire history with any of bzr, hg or git will only be a little larger than a svn checkout.
Speaking for bzr is the fact that we have Michael Hudson in the PyPy community, and he seems to be a guru on bzr by now.
Right, I'd be more than happy to "consult" on a bzr migration. Another thing for bzr is that people can put branches up on Launchpad, even if the main branches lived on codespeak and required ssh access to commit too. I don't want to pimp bzr too hard, as I actually have very little experience with the other systems.
Hg seems to be a little more tedious in its command set than the other two. Git used to be rather obscure, but is these days very straight forward to use. Git and bzr have very good visualization tools for showing the splitting and merging of branches. Git seems to be best at showing exactly what changed between 2 versions of the code (even 2 versions that are not on the same subtree).
I'm not sure what you mean by this last point?
The strongest argument in favour of git seems to me to be the rebase feature, which allows one to make a branch for a new feature, work on the branch and then update the base of the branch to branch off at a later point in time. I haven't identified this feature in hg and bzr, but then I haven't read all the documentation in detail.
This point is somewhat controversial in DVCS circles, and is viewed as a non-feature in the Bazaar camp at least. The issue is that if you make a branch, then publish it, then rebase it onto a later version of trunk, anyone who has based work on the earlier work is left high and dry.
The one feature of svn that we would miss is the inclusion of foreign version controlled trees, like we do with the pylib tree. We would have to do this in a different way than before, since none of the systems have this feature. I'm not sure it makes sense to have the close svn coupling between the projects any more, in any case.
Right. bzr has some experimental support for 'nested trees', which would replace this functionality, but realistically it's not going to be ready for prime time any time soon.
The effort of learning any of the systems seems to be quite insignificant. getting up to the level of svn is a matter of 15 minutes and learning the whole range of commands in a tool is not a big effort.
And I would say that bzr (and I suspect this applies to the others) is actually a lot nicer to use than svn. It seems to have a much more 'humane' interface, compared to svn's, which always makes me think of barbed wire and pointy metal things for some reason.
There is of course the hooks that send mail and blurb in IRC, but all 3 systems seem to have at least as powerful hooks as svn.
Right. There is also the issue of how to do a conversion. codespeak's svn repository is large and old and kind of scary in some quarters, and the variety of ways that merges have been done in pypy's past means that an import that preserves the merge information seems pretty unlikely for any tool. I'll try to get a bzr-svn import of pypy going again, though. Cheers, mwh
On Sep 11, 2008, at 7:55 PM, Michael Hudson wrote:
2008/9/12 Jacob Hallén <jacob@openend.se>:
There are, in my opinion, 3 viable choices of DVCS for PyPy:
- git
- hg (mercurial)
- bzr
I'd agree with this list, afaik, there's nothing else out there to take seriously at present. Darcs with it's cherry picking and lack of enforcing a linear history would be interesting, but I'd be a bit wary of using it seriously, tbh.
I think we could remove git for the discussion, as its lack of windows support (and no mac packaging besides ports) makes it a very high barrier of entry for now and future windows developers. Isn't cfbolz also somewhat involved in bzr in the past?
2008/9/12 Jacob Hallén <jacob@openend.se>:
The one feature of svn that we would miss is the inclusion of foreign version controlled trees, like we do with the pylib tree. We would have to do this in a different way than before, since none of the systems have this feature. I'm not sure it makes sense to have the close svn coupling between the projects any more, in any case.
Mercurial has "forest" extension which allows nested Mercurial repositories. http://www.selenic.com/mercurial/wiki/index.cgi/ForestExtension It is an external extension, but it is used by high profile projects like Sun's OpenJDK. -- Seo Sanghyeon
Hi there, I'm a Mercurial committer who also listens in on #pypy some time. I remember we had a discussion about this subject with Armin and Michael a few weeks ago, where it was stated that the conversion process would likely be scary and that SVN was working alright for this project... Jacob Hallén wrote:
to be a guru on bzr by now. Hg seems to be a little more tedious in its command set than the other two. Git used to be rather obscure, but is these
I don't know why hg seems tedious. hg's commands have been designed to be rather a lot like SVN where this is possible.
days very straight forward to use. Git and bzr have very good visualization tools for showing the splitting and merging of branches. Git seems to be best at showing exactly what changed between 2 versions of the code (even 2 versions that are not on the same subtree).
The strongest argument in favour of git seems to me to be the rebase feature, which allows one to make a branch for a new feature, work on
hg has also has fairly good visualization tools. We have a command-line graph in the form of the glog extension, and in the upcoming release we will feature a canvas-based graph in the included web interface: http://hg.xavamedia.nl/mercurial/crew/graph/ In addition, there are a TortoiseHG tool for Windows which also provides some integration/GUI on Linux (PyGTK-based). the branch and then update the base of the branch to branch off at a later point in time. I haven't identified this feature in hg and bzr, but then I haven't read all the documentation in detail. In the next release, hg will come with a rebase extension. hg also comes with the very powerful mq extension, which allows developers to keep versioned patch queues around, which can be detached from the repo and distributed separately (and which allows for easily updating, splitting and merging of in-development patches/changesets).
The one feature of svn that we would miss is the inclusion of foreign version controlled trees, like we do with the pylib tree. We would have to do this in a different way than before, since none of the systems have this feature. I'm not sure it makes sense to have the close svn coupling between the projects any more, in any case.
Mercurial has the forest extension, which does this for OpenJDK, and NetBeans, I think. In fact, I think Mercurial is of the DVCS's the one with the most large projects: OpenJDK, NetBeans and Mozilla all use hg.
There is of course the hooks that send mail and blurb in IRC, but all 3 systems seem to have at least as powerful hooks as svn.
In addition, hg has a very easy-to-use extension model, allowing Python developers to easily extend the functionality available in the tool. In fairness, Mercurial's support for branches is a little different than most people are used to; either people can use separate clones for branches (which use hardlinks, so they aren't too expensive), or changesets get committed to a branch name, but since history in hg is immutable (I think this is largely true for any DVCS, but hg can be a little more strict about it), this means branches cannot currently be deleted. Branches that have been merged back into another branch can be kept out of command output, though. Anyway, I'd like to help out with the conversion and infrastructure, should PyPy chose hg. Cheers, Dirkjan
Hi, On Fri, Sep 12, 2008 at 12:13:04AM +0200, Jacob Hallén wrote:
I think that it would be a suitable point in time to switch to a new version control system right after the 1.1 release.
I haven't read this thread in detail, but I should quickly mention that subversion works quite well enough from my point of view. An obstacle to switching is the amount of integration efforts that was done, mainly on codespeak, between the repository and various other facilities. It is ok to discuss alternatives, but we need to keep in mind that someone will have to stand up and redo all that work on codespeak. A bientot, Armin.
On Sat, 13 Sep 2008 16:29:36 +0200, Armin Rigo <arigo@tunes.org> wrote:
Hi,
On Fri, Sep 12, 2008 at 12:13:04AM +0200, Jacob Hallén wrote:
I think that it would be a suitable point in time to switch to a new version control system right after the 1.1 release.
I haven't read this thread in detail, but I should quickly mention that subversion works quite well enough from my point of view.
An obstacle to switching is the amount of integration efforts that was done, mainly on codespeak, between the repository and various other facilities. It is ok to discuss alternatives, but we need to keep in mind that someone will have to stand up and redo all that work on codespeak.
In order to try to simplify discussions about changing tools, Twisted has tried to start building up a centralized list of the important features we use (still something of a work in progress): http://twistedmatrix.com/trac/wiki/WorkflowRequirements If someone writes up something similar for PyPy then it might be easier to talk about what tools can be replaced, what the consequences are, etc. Jean-Paul
On Fri, Sep 12, 2008 at 00:13, Jacob Hallén <jacob@openend.se> wrote:
There are, in my opinion, 3 viable choices of DVCS for PyPy:
- git
- hg (mercurial)
- bzr
Hi there :)
I think they would all be an improvement over SVN and they all have their strengths and weaknesses. In favour of bzr and hg is the fact that they are written in Python, with core parts in C. Git is all C. Git currently requires a cygwin environment to run on Windows, hg and bzr appear to have native windows versions. Git is the fastest of the lot with hg in second place. Bzr is still a fair bit slower, though this is being worked on. Hg is really good at keeping the repositories small, with git in second place.
Excellent quick guide : http://www.infoq.com/articles/dvcs-guide
The strongest argument in favour of git seems to me to be the rebase feature, which allows one to make a branch for a new feature, work on the branch and then update the base of the branch to branch off at a later point in time. I haven't identified this feature in hg and bzr, but then I haven't read all the documentation in detail.
Don't miss "extensions" (but these are officials, include in core): - Bisec http://www.selenic.com/mercurial/wiki/index.cgi/BisectExtension - Graph log (show grah of the tree, "ascii art") http://www.selenic.com/mercurial/wiki/index.cgi/GraphlogExtension - Hgk (same above, but with tcl/tk, copy of git's one) http://www.selenic.com/mercurial/wiki/index.cgi/HgkExtension - Mq (wonderfull Hg version of "quilt', must have!) http://www.selenic.com/mercurial/wiki/index.cgi/MqExtension - Rebase (warning: available only to the next stable version) http://www.selenic.com/mercurial/wiki/index.cgi/RebaseExtension - Transplant ("transplant" patches from another branch or repositor) http://www.selenic.com/mercurial/wiki/index.cgi/TransplantExtension
The one feature of svn that we would miss is the inclusion of foreign version controlled trees, like we do with the pylib tree.
Do you means externals? The features provided by externals in Subversion are not currently available in Mercurial, NestedRepositories is a proposal to integrate this extension into Mercurial: http://www.selenic.com/mercurial/wiki/index.cgi/NestedRepositories -- Seb
On Sat, Sep 13, 2008 at 9:32 PM, Sebastien Douche <sdouche@gmail.com> wrote:
On Fri, Sep 12, 2008 at 00:13, Jacob Hallén <jacob@openend.se> wrote:
There are, in my opinion, 3 viable choices of DVCS for PyPy:
- git
- hg (mercurial)
- bzr
Hi there :)
I think they would all be an improvement over SVN and they all have their strengths and weaknesses. In favour of bzr and hg is the fact that they are written in Python, with core parts in C. Git is all C. Git currently requires a cygwin environment to run on Windows, hg and bzr appear to have native windows versions. Git is the fastest of the lot with hg in second place. Bzr is still a fair bit slower, though this is being worked on. Hg is really good at keeping the repositories small, with git in second place.
Excellent quick guide : http://www.infoq.com/articles/dvcs-guide
The strongest argument in favour of git seems to me to be the rebase feature, which allows one to make a branch for a new feature, work on the branch and then update the base of the branch to branch off at a later point in time. I haven't identified this feature in hg and bzr, but then I haven't read all the documentation in detail.
Don't miss "extensions" (but these are officials, include in core):
- Bisec http://www.selenic.com/mercurial/wiki/index.cgi/BisectExtension
- Graph log (show grah of the tree, "ascii art") http://www.selenic.com/mercurial/wiki/index.cgi/GraphlogExtension
- Hgk (same above, but with tcl/tk, copy of git's one) http://www.selenic.com/mercurial/wiki/index.cgi/HgkExtension
- Mq (wonderfull Hg version of "quilt', must have!) http://www.selenic.com/mercurial/wiki/index.cgi/MqExtension
- Rebase (warning: available only to the next stable version) http://www.selenic.com/mercurial/wiki/index.cgi/RebaseExtension
- Transplant ("transplant" patches from another branch or repositor) http://www.selenic.com/mercurial/wiki/index.cgi/TransplantExtension
The one feature of svn that we would miss is the inclusion of foreign version controlled trees, like we do with the pylib tree.
Do you means externals? The features provided by externals in Subversion are not currently available in Mercurial, NestedRepositories is a proposal to integrate this extension into Mercurial: http://www.selenic.com/mercurial/wiki/index.cgi/NestedRepositories
And here is a translation table to git if you later decide (like us:) that git is better for you: http://wiki.sympy.org/wiki/Git_hg_rosetta_stone Also git has the excellent "git svn", so you can use git with pypy right now and commit using svn (git svn dcommit). Ondrej
participants (10)
-
Armin Rigo
-
Dirkjan Ochtman
-
Jacob Hallén
-
Jean-Paul Calderone
-
Leonardo Santagada
-
Michael Hudson
-
Neal Becker
-
Ondrej Certik
-
Sebastien Douche
-
Seo Sanghyeon