[Python-Dev] Looking for VCS usage scenarios

Thu Nov 6 02:36:28 CET 2008

In what follows, caveat IANB (I am not Brett, and neither is
Cosmin<wink>), but there is some experience with these systems, and my
recommendations are based on that.

Cosmin Stejerean writes:
 > On Nov 5, 2008, at 12:16 PM, skip at pobox.com wrote:

 > > What DVCS fits my poor brain best?  I feel I'm like a dinosaur
 > > not being able to figure out how I'm supposed to contribute
 > > changes to the system.

You need not feel that way.  It's not you---the flexibility of dVCS
means that until the Powers That Be promulgate a Workflow, this will
be ambiguous.

This is part of the purpose of the PEP.  We[1] will be presenting the
5-finger exercises required to accomplish typical (and perhaps some
not-so-typical) tasks, as well as benchmarks for the various systems.

 > > Do I:
 > >
 > >    * commit my changes to some central branch?

Call this the "record && commit to authoritative" workflow.

 > Not exactly. If you had commit access to the central repository you  
 > could commit then push, which would be the DVCS equivalent of  
 > committing to a central branch.

The workflow where general contributors commit directly to the trunk
surely won't be used in Python, because of the instability it would
cause.  It would be possible to have a staging branch for this
purpose, but IMO that's not a very effective use of a dVCS.[2]

It is useful to avoid the term "commit" here because its semantics
vary across systems.  As Cosmin points out, in a dVCS, what is
accomplished by "vc commit" in CVS is done as "vc commit; vc push".  I
use the terminology "record" for the action of adding a workspace-
based patch or snapshot to a repository.  "push" (and "pull") move
content between repositories.  Unfortunately "commit" is the name of
the record command in most dVCSes, so this terminology probably won't
catch on.

Also, when talking about "where to commit" in terms of communication
among developers, you should probably refer to storage locations as
"repositories".  "Branch" is another term that has varying semantics
in different VCSes.  In some systems (git) it is reasonable to think
of repositories containing more than one branch, and branches as
existing in more than one repository (but this isn't quite robust in
git because branch names are just names, not first-class objects).  In
others (Darcs is the extreme) repository == branch == workspace.

(I'm trying to get permission to publish a 3rd party's draft document
that goes into these issues in detail; here I just want to raise
awareness that the intuitions that go with CVS/Subversion usage of
various terms is *not* always going to carry over to dVCSes.)

 > >    * commit my changes locally then create diffs I then submit to the
 > >      tracker?

"Record && patch" workflow.

 > Possible.

But again not very effective.  Under a dVCS I believe these patches
will languish in the tracker as they do today, unless tools are
written to automatically pull them into a repo somewhere.

 > >    * commit locally then push them somewhere?

"Record && push to candidate" workflow.

If we go with Bazaar, this is very likely to occur, especially if
Canonical's launchpad is the host.  This is what Linux kernel does on
git.kernel.org as well, if I understand their workflow correctly, and
what github helps to support.  I imagine Mercurial has an equivalent
but I'm not familiar with it.

 > >    * commit locally then ask someone to pull?

"Record && request pull" workflow.

 > Often preferred way to submit patches, as you can continue to maintain  
 > the patch locally against newer versions of trunk so that the patch is  
 > not obsolete by the time people finally get around to it.

I disagree.  This doesn't scale to Python size.  For distributed VC to
work, somebody has to maintain a repo 24x7.  Python has to do this for
the trunk; the additional burden for contributed patches is not great.
There is no real advantage to having contributors do so, too.[3]
Integrators and interested third parties also must keep track of
contributor's repo URLs.  (Cf. Skip's question about discovering repos.)
Not happy stuff.

The "record && push" workflow scales much better for numbers of
contributors, as each contributor needs only to maintain one "push"
URL, and integrators only one "pull" base URL.

 > >    * Not commit anything anywhere but just submit patches to the  
 > > tracker?

"Patch from workspace" workflow.

 > Likely possible, but it's good to have the patch committed locally so  
 > you can modify it or continue to build upon it until it gets accepted.

The same considerations as "record && patch" also apply here.

 > > In addition:
 > >
 > >    * Will there be a central repository?
 >
 > Generally there should be a central authoritative repository where  
 > people can turn to for the official version.

Ie, "yes".  There's no point in a PEP unless there's going to be a
central repo and a defined workflow for getting contributions into it.

Note that you can always maintain your own local repo with dVCS.

 > >    * How will I know which of possibly many repos is "authoritative"?
 > 
 > The authoritative repo should generally be linked to from the website  
 > so that people can easily find it.

That depends.  The notion of "authoritative" gets weakened in a
distributed system, and probably more important is "which repo will be
used to make the next official release".

However, although I can't say what the mechanism will be, be sure you
will not have a problem learning which is authoritative for the trunk
or where to find RCs and releases.  (If you do, it's a doc problem and
it will be fixed quickly.)

You may have more trouble with third-party patches gotten from
third-party repos.  GNU Arch has a system for handling this (patch
names contain the originating repo).  That was one of the first things
the Bazaar people discarded from Arch, though.  Darcs has something
similar, but again Darcs is not a candidate here.  I think for such
"maverick" contributions there will never really be a substitute for
watching the ML and tracker like a hawk.

 > >    * How will I discover other repos?  For example, if the
 > >      safethread stuff is sitting somewhere in a repository can I
 > >      find it on my own somehow?
 > 
 > I'm not aware of any decentralized system for discovering  
 > repositories. Something like github or bitbucket could be used which  
 > help you discover repositories, but a wiki page with a list of  
 > alternative repositories and their purpose should suffice.

Most likely the repos you care about will be hosted in a central
location, and will be browsable from a single base URL.  See
http://git.kernel.org/ for the git version of a browser.  Mercurial
and Bazaar support similar facilities either in the VCS itself or in
an easily available add-on.  Fancier support is available via systems
like github and Launchpad.

If you care about minor, locally hosted, patches, you'll have to
follow the tracker and mailing list closely and grep out the URLs to
them.  However, systems like Launchpad and github make it feasible to
have branches for a single patch.

GNU Arch also had some systems for third-party repo discovery and
maintenance of a database of them, but they ended up creating a
"Supermirror" which gave a similar workflow to Launchpad/github et
al.  So I think you can probably discount the existence or development
of such a discovery system.

 > >    * Will a DVCS allow simpler operation as if we are still using a
 > >      centralized system like CVS or Subversion?

Yes and no.  There is nothing to prevent a formal workflow like that
in CVS/Subversion.  However, the separation of "commit" into "record
&& push to authoritative" leaves open the possibility of annoying
glitches until you get used to it, and even then it's easy to forget
to push or to forget that you've committed not-for-pushing stuff, etc,
etc.  In practice it is probably simpler to use a dVCS-specialized
workflow like "record && push to candidate".

Footnotes: 
[1]  I have time constraints which may not be acceptable to Brett, but
he's offered to delegate the presentation of git to me, and those of
hg and bzr to others.

[2]  This is how XEmacs and Scheme48 do it.

[3]  The insurance value is small.  Integrators and other interested
parties will surely be keeping local copies of almost all branches; if
the central repo were to be wiped out, most branches will be
recoverable from those widely distributed copies.