skip@pobox.com writes:
With all these distributed revision control systems now available (bzr, hg, darcs, svk, many more), I find I need an introduction to the concepts and advantages of repository distribution.
Can someone point me to some useful content (web pages or books) which will help me wrap my brain around the ideas? Maybe a compare/contrast of the major players?
Others have mentioned a bunch of resources. They're all OK, and should reassure you that dVCS is not all that different from what you're used to. Here I'll post some comments that as far as I know are not in any of the existing resources. One caveat that you should be very careful about while reading them: any command that involves receiving changes from a remote repo and updating your workspace. Terms like "get", "fetch", "pull", "update", "clone", and even "checkout" are commonly used to describe these operations, but the actual semantics of the commands named by those terms differs wildly. For example, in git, "fetch" receives the metadata and new content, while "pull" does a fetch, then attempts to update the workspace and performs 3-way merges. In Mercurial, "pull" and "fetch" have the opposite semantics. Also, note that while CVS and Subversion automatically do a merge when updating, the dVCSes all make this distinction between fetching new content and merging it into your workspace. As already described, some of them normally combine the operations, others tend to ask the user to do them "by hand", but all provide both ways to do it. (That should send a shiver down your Pythonic spine!) Another one is that in git and Mercurial, "clone" replicates a repo, while "checkout" switches between branches in a single workspace. In bzr, "clone" is an alias for "checkout", and normally creates a new workspace.
It's not obvious how I push changes back upstream.
From a participating developer's point of view, Eric Raymond's in-progress survey captures this aspect pretty well as a distinction between "update before record" and "merge after record" (my terms, not his). The point is that in CVS or Subversion, you *cannot* commit to
It is very important to remember that in centralized systems like CVS or Subversion, committing *is* pushing, while in a dVCS these operations are separate. In many projects the workflow is that you *record* your changes in a local repo, then you *push* them to a shared repo. AFAIK all of the contenders do call the record operation "commit", and the publish operation "push".[1] (Other workflows you will hear about are based on mailing patches -- eg, Linux, while yet others are based on gatekeepers pulling from your repo -- Arch developers tend to favor this. Don't worry about them, these are not going to be relevant to Python for a while.) mainline if you haven't merged in all concurrent changes to mainline. In a distributed VCS, on the other hand, you record your changes locally at will, then merge revisions at your convenience, finally pushing them upstream. It sound like a small difference, but it's actually amazingly liberating. Otherwise, it doesn't need to be any different from your current workflow, and I doubt that it will be until the most active Python developers start to feel a need for it.
It seems to me that it has the potential for leading to anarchy,
git is *designed* around Linus's ability to manage chaos, but because of Linus, there is no anarchy. The same will be true for Python (although the personalities of the leading developers are different, so the details of why no anarchy will differ). A more optimistic way to put your point is that dVCSes have a great potential for encouraging experimentation. In general, you'll always have a pretty good idea where you want to pull from, and the gatekeepers will tell if you may and where to push. Footnotes: [1] Darcs uses "record" for the record operation, but Darcs is highly unlikely to become the Python dVCS of choice.