[pypy-dev] Switching to a distributed version control system

Jacob Hallén jacob at openend.se
Fri Sep 12 00:13:04 CEST 2008


I think that it would be a suitable point in time to switch to a new version 
control system right after the 1.1 release.

The first question to ask is of course why we should switch at all.

While the distributed version control systems allow a workflow where people 
maintan their own repositories and there is a designated role of integrator, 
I don't think we need such a workflow at this point in time. It may very well 
be the model to use in the future, when we have a production usable system, 
but right now this feature has no direct appeal.

The compelling reason to switch is in my opinion the superior support for 
branches that the DVCS's provide. Creating a branch is a very cheap operation 
and merging it to the trunk or whatever branch is far superior to what SVN 
provides. I think this feature would change the way we are working and 
improve our productivity by a significant factor.

There are a few other arguments in favour of a switch. People working through 
GPRS and off-line would have an easier time handling branches and updates. It 
would be possible to do sprints without a working internet access.

There are, in my opinion, 3 viable choices of DVCS for PyPy:

- git

- hg (mercurial)

- bzr

I think they would all be an improvement over SVN and they all have their 
strengths and weaknesses. In favour of bzr and hg is the fact that they are 
written in Python, with core parts in C. Git is all C. Git currently requires 
a cygwin environment to run on Windows, hg and bzr appear to have native 
windows versions. Git  is the fastest of the lot with hg in second place. Bzr 
is still a fair bit slower, though this is being worked on. Hg is really good 
at keeping the repositories small, with git in second place. Speaking for bzr 
is the fact that we have Michael Hudson in the PyPy community, and he seems 
to be a guru on bzr by now. Hg seems to be a little more tedious in its 
command set than the other two. Git used to be rather obscure, but is these 
days very straight forward to use. Git and bzr have very good visualization 
tools for showing the splitting and merging of branches. Git seems to be best 
at showing exactly what changed between 2 versions of the code (even 2 
versions that are not on the same subtree).

The strongest argument in favour of git seems to me to be the rebase feature, 
which allows one to make a branch for a new feature, work on the branch and 
then update the base of the branch to branch off at a later point in time. I 
haven't identified this feature in hg and bzr, but then I haven't read all 
the documentation in detail.

The one feature of svn that we would miss is the inclusion of foreign version 
controlled trees, like we do with the pylib tree. We would have to do this in 
a different way than before, since none of the systems have this feature. I'm 
not sure it makes sense to have the close svn coupling between the projects 
any more, in any case.

The effort of learning any of the systems seems to be quite insignificant. 
getting up to the level of svn is a matter of 15 minutes and learning the 
whole range of commands in a tool is not a big effort.

There is of course the hooks that send mail and blurb in IRC, but all 3 
systems seem to have at least as powerful hooks as svn.

Jacob



More information about the Pypy-dev mailing list