Pardon me for coming a little late to the SCM discussion, but I thought I would throw a few comments in.
A little background: I've used Perforce, CVS, Subversion and BitKeeper for a number of years. Currently, I hack on Mercurial URL:http://www.selenic.com/mercurial.
However, I'm not here to try and specifically push Mercurial, but rather to bring up a few points that I haven't seen made in the earlier discussions.
The biggest distinguishing factor between centralised and decentralised SCMs is the kinds of interaction they permit between the core developer community and outsiders.
The centralised SCM tools all create a wall between core developers (i.e. people with commit access to the central repository) and people who are on the fringes. Outsiders may be able to get anonymous read-only access, but they are left up to their own devices if they want to make changes that they would like to contribute back to the project.
With centralised tools, any revision control that outsiders do must be ad-hoc in nature, and they cannot share their changes in a natural way (i.e. preserving revision history) with anyone else.
I do not follow Python development closely, so I have no idea how open Python typically is to contributions from people outside the core CVS committers.
However, it's worth pointing out that with a distributed SCM - it doesn't really matter which one you use - it is simple to put together a workflow that operates in the same way as a centralised SCM. You lose nothing in the translation. What you gain is several-fold:
* Outsiders get to work according to the same terms, and with the same tools, as core developers. * Everyone can perform whatever work they want (branch, commit, diff, undo, etc) without being connected to the main repository in any way. * Peer-level sharing of changes, for testing or evaluation, is easy and doesn't clutter up the central server with short-lived branches. * Speculative branching: it is cheap to create a local private branch that contains some half-baked changes. If they work out, fold them back and commit them to the main repository. If not, blow the branch away and forget about it.
Regardless of what you may think of the Linux development model, it is teling that there have been about 80 people able to commit changes to Python since 1990 (I just checked the cvsroot tarball), whereas my estimate is that about ten times as many used BitKeeper to contribute changes to the Linux kernel just since the 2.5 tree began in 2002. (The total number of users who contributed changes was about 1600, 1300 of whom used BK, while the remainder emailed plain old patches that someone applied.)
It is, of course, not possible for me to tell which CVS commits were really patches that originated with someone else, but my intent is to show how the choice of tools affects the ability of people to contribute in "natural" ways. How much of the difference in numbers is due to the respective popularity or accessibility of the projects is anyone's guess.
With any luck, there's some food for thought above.
Regards,
<b
On Aug 15, 2005, at 5:04 PM, Bryan O'Sullivan wrote:
The centralised SCM tools all create a wall between core developers (i.e. people with commit access to the central repository) and people who are on the fringes. Outsiders may be able to get anonymous read-only access, but they are left up to their own devices if they want to make changes that they would like to contribute back to the project.
But, if python is using svn, outside developers can seamlessly use svk (http://svk.elixus.org/) to do their own branches if they wish, no? Sure, that is "their own devices", but it seems a fairly workable solution to me as the two are so closely related.
Now, I've never tried this, so I'm just judging from the "marketing material" on the svk website.
James
[Bryan O'Sullivan]
The centralised SCM tools all create a wall between core developers (i.e. people with commit access to the central repository) and
people
who are on the fringes. Outsiders may be able to get anonymous read-only access, but they are left up to their own devices if they want to make changes that they would like to contribute back to the project.
[James Y Knight]
But, if python is using svn, outside developers can seamlessly use svk (http://svk.elixus.org/) to do their own branches if they wish, no? Sure, that is "their own devices", but it seems a fairly workable solution to me as the two are so closely related.
+1 This seems to be the most flexible and sensible idea so far. The svn system has had many accolades; Martin knows how to convert it; and it presents only a small learning curve to cvs users. Optionally adding svk to the mix allows us to get the benefits of a distributed system without any additional migration or support issues. Very nice.
Raymond
Bryan O'Sullivan wrote:
However, it's worth pointing out that with a distributed SCM - it doesn't really matter which one you use - it is simple to put together a workflow that operates in the same way as a centralised SCM. You lose nothing in the translation. What you gain is several-fold:
That may be off-topic for python-dev, but can you please explain how this works?
* Outsiders get to work according to the same terms, and with the same tools, as core developers.
I'm using git on the kernel level. In what way am I at the same level as the core developers? They can write to the kernel.org repository, I cannot. They use commit, I send diffs.
* Everyone can perform whatever work they want (branch, commit, diff, undo, etc) without being connected to the main repository in any way.
So what? If I want to branch, I create a new sandbox. I have to do that anyway, since independent projects should not influence each other. I can also easily diff, whether I have write access or not (in svn, even simpler so than in CVS). There is no easy way to undo parts of the changes, that's true.
* Peer-level sharing of changes, for testing or evaluation, is easy and doesn't clutter up the central server with short-lived branches.
So how does that work? If I commit the changes to my local version of the repository, how do they get peer-level-shared? I turn off my machine when I leave the house, and I don't have a permanent IP, anyway, to host a web server or some such.
* Speculative branching: it is cheap to create a local private branch that contains some half-baked changes. If they work out, fold them back and commit them to the main repository. If not, blow the branch away and forget about it.
I do that with separate sandboxes right now.
cp -a py2.5 py-64bit
gives me a new sandbox, in which I can do my speculative project.
Regardless of what you may think of the Linux development model, it is teling that there have been about 80 people able to commit changes to Python since 1990 (I just checked the cvsroot tarball), whereas my estimate is that about ten times as many used BitKeeper to contribute changes to the Linux kernel just since the 2.5 tree began in 2002. (The total number of users who contributed changes was about 1600, 1300 of whom used BK, while the remainder emailed plain old patches that someone applied.)
Hmm. The changes of these 800 people had to be approved by some core developers, or perhaps even all approved by Linus Torvalds, right? This is really the same for Python: A partial list of contributors is in Misc/ACKS (663 lines at the moment), and this doesn't list all the people who contributed trivial changes. So I guess Python has the same number of contributors per line as the Linux kernel.
It is, of course, not possible for me to tell which CVS commits were really patches that originated with someone else, but my intent is to show how the choice of tools affects the ability of people to contribute in "natural" ways.
I hear that, but I have a hard time believing it. People find the "cvs diff -u, send diff file for discussion to patches tracker" cycle quite natural.
Regards, Martin
On Mon, 2005-08-15 at 23:29 +0200, "Martin v. Löwis" wrote:
That may be off-topic for python-dev, but can you please explain how this works?
It's simple enough. In place of a central server that hosts a set of repositories and a number of branches, and to which only a few people have access, you use a central server that hosts a number of repositories, and you get the idea.
But the difference lies in the way you use it. In the centralised model, there's only one server, and only one repository, anywhere. In the distributed model, each developer has one or more repositories that they keep in sync with the central ones they are interested in, pulling and pushing changes as necessary. The difference is that they get to share changes horizontally if they wish, without going through the central server.
I'm using git on the kernel level. In what way am I at the same level as the core developers?
You can use the same tools to do the same things they can. You can communicate with them in terms of commits. You may each have access to different sets of servers from which other people can pull changes, but if they want to take changes from you, you have the option of giving them complete history of all the edits and merges you've done, with no information loss.
So how does that work? If I commit the changes to my local version of the repository, how do they get peer-level-shared?
You have to do something to share them, but it's a lot simpler than sending diffs to a mailing list, or attaching them to a bug tracking system note.
Hmm. The changes of these 800 people had to be approved by some core developers, or perhaps even all approved by Linus Torvalds, right?
True.
I hear that, but I have a hard time believing it. People find the "cvs diff -u, send diff file for discussion to patches tracker" cycle quite natural.
People will find doing the same of anything, over and over for fifteen years, quite natural :-)
<b