[pypy-dev] Switching to a distributed version control system

Michael Hudson micahel at gmail.com
Fri Sep 12 00:55:36 CEST 2008

2008/9/12 Jacob Hallén <jacob at openend.se>:
> I think that it would be a suitable point in time to switch to a new version
> control system right after the 1.1 release.
> The first question to ask is of course why we should switch at all.
> While the distributed version control systems allow a workflow where people
> maintan their own repositories and there is a designated role of integrator,
> I don't think we need such a workflow at this point in time. It may very well
> be the model to use in the future, when we have a production usable system,
> but right now this feature has no direct appeal.

I would imagine there would still be a 'trunk' branch somewhere that
would be where releases get made from and so on.

> The compelling reason to switch is in my opinion the superior support for
> branches that the DVCS's provide. Creating a branch is a very cheap operation
> and merging it to the trunk or whatever branch is far superior to what SVN
> provides. I think this feature would change the way we are working and
> improve our productivity by a significant factor.
> There are a few other arguments in favour of a switch. People working through
> GPRS and off-line would have an easier time handling branches and updates. It
> would be possible to do sprints without a working internet access.
> There are, in my opinion, 3 viable choices of DVCS for PyPy:
> - git
> - hg (mercurial)
> - bzr

I'd agree with this list, afaik, there's nothing else out there to
take seriously at present.  Darcs with it's cherry picking and lack of
enforcing a linear history would be interesting, but I'd be a bit wary
of using it seriously, tbh.

> I think they would all be an improvement over SVN and they all have their
> strengths and weaknesses. In favour of bzr and hg is the fact that they are
> written in Python, with core parts in C. Git is all C. Git currently requires
> a cygwin environment to run on Windows, hg and bzr appear to have native
> windows versions. Git  is the fastest of the lot with hg in second place. Bzr
> is still a fair bit slower, though this is being worked on.

I can say here that I work all day every day with bzr on a codebase
(Launchpad) that's actually a pretty similar size to pypy -- 5000
files in 300 directories, a few tens of thousands of revisions -- and
performance is mostly fine.

> Hg is really good
> at keeping the repositories small, with git in second place.

It's worth mentioning how very bad svn is here: svn keeps a separate
copy of every file in the working tree!  I think that a copy of the
tree containing the entire history with any of bzr, hg or git will
only be a little larger than a svn checkout.

> Speaking for bzr
> is the fact that we have Michael Hudson in the PyPy community, and he seems
> to be a guru on bzr by now.

Right, I'd be more than happy to "consult" on a bzr migration.

Another thing for bzr is that people can put branches up on Launchpad,
even if the main branches lived on codespeak and required ssh access
to commit too.

I don't want to pimp bzr too hard, as I actually have very little
experience with the other systems.

> Hg seems to be a little more tedious in its
> command set than the other two. Git used to be rather obscure, but is these
> days very straight forward to use. Git and bzr have very good visualization
> tools for showing the splitting and merging of branches. Git seems to be best
> at showing exactly what changed between 2 versions of the code (even 2
> versions that are not on the same subtree).

I'm not sure what you mean by this last point?

> The strongest argument in favour of git seems to me to be the rebase feature,
> which allows one to make a branch for a new feature, work on the branch and
> then update the base of the branch to branch off at a later point in time. I
> haven't identified this feature in hg and bzr, but then I haven't read all
> the documentation in detail.

This point is somewhat controversial in DVCS circles, and is viewed as
a non-feature in the Bazaar camp at least.  The issue is that if you
make a branch, then publish it, then rebase it onto a later version of
trunk, anyone who has based work on the earlier work is left high and

> The one feature of svn that we would miss is the inclusion of foreign version
> controlled trees, like we do with the pylib tree. We would have to do this in
> a different way than before, since none of the systems have this feature. I'm
> not sure it makes sense to have the close svn coupling between the projects
> any more, in any case.

Right.  bzr has some experimental support for 'nested trees', which
would replace this functionality, but realistically it's not going to
be ready for prime time any time soon.

> The effort of learning any of the systems seems to be quite insignificant.
> getting up to the level of svn is a matter of 15 minutes and learning the
> whole range of commands in a tool is not a big effort.

And I would say that bzr (and I suspect this applies to the others) is
actually a lot nicer to use than svn.  It seems to have a much more
'humane' interface, compared to svn's, which always makes me think of
barbed wire and pointy metal things for some reason.

> There is of course the hooks that send mail and blurb in IRC, but all 3
> systems seem to have at least as powerful hooks as svn.


There is also the issue of how to do a conversion.  codespeak's svn
repository is large and old and kind of scary in some quarters, and
the variety of ways that merges have been done in pypy's past means
that an import that preserves the merge information seems pretty
unlikely for any tool.  I'll try to get a bzr-svn import of pypy going
again, though.


More information about the Pypy-dev mailing list