[Python-Dev] Mercurial migration: progress report (PEP 385)

Fri Jul 3 17:17:02 CEST 2009

Dirkjan Ochtman schrieb:

> - Fifth, here's a list of things, off the top of my head, that still need doing:
> 
> * Get agreement on branch strategy and branch processing (list of
> branches + proposed handling at
> http://hg.python.org/pymigr/file/tip/all-branches.txt) <--- PLEASE
> REVIEW

Do you have a key to the second column in that file? E.g. the difference
between "strip" and "discard" is not clear to me. "strip partial"?

Why are there branch names starting with "../"?

[PEP 385]

> ==============================================================
> 
> Motivation
> 
> After having decided to switch to the Mercurial DVCS, the actual
> migration still has to be performed. In the case of an important piece
> of infrastructure like the version control system for a large,
> distributed project like Python, this is a significant effort. This
> PEP is an attempt to describe the steps that must be taken for further
> discussion. It's somewhat similar to PEP 347 [1], which discussed the
> migration to SVN.
> 
> To make the most of hg, I (Dirkjan) would like to make a high-fidelity
> conversion, such that (a) as much of the svn metadata as possible is
> retained, and (b) all metadata is converted to formats that are common
> in Mercurial. This way, tools written for Mercurial can be optimally
> used. In order to do this, I want to use the hgsubversion [2] software
> to do an initial conversion. This hg extension is focused on providing
> high-quality conversion from Subversion to Mercurial for use in
> two-way correspondence, meaning it doesn't throw away as much
> available metadata as other solutions.
> 
> Such a conversion also seems like a good time to reconsider the
> contents of the repository and determine if some things are still
> valuable. In this spirit, the following sections also propose
> discarding some of the older metadata.
> Timeline
> 
> TBD; needs fully working hgsubversion and consensus on this document.
> Transition plan
> Branch strategy
> 
> Mercurial has two basic ways of using branches: cloned branches, where
> each branch is kept in a separate repository, and named branches,
> where each revision keeps metadata to note on which branch it belongs.
> The former makes it easier to distinguish branches, at the expense of
> requiring more disk space on the client. The latter makes it a little
> easier to switch between branches, but often has somewhat unintuitive
> results for people (though this has been getting better in recent
> versions of Mercurial).
> 
> I'm still a bit on the fence about whether Python should adopt cloned
> branches and named branches. Since it usually makes more sense to tag
> releases on the maintenance branch, for example, mainline history
> would not contain release tags if we used cloned branches. Also,
> Mercurial 1.2 and 1.3 have the necessary tools to make named branches
> less painful (because they can be properly closed and closed heads are
> no longer considered in relevant cases).
> 
> A disadvantage might be that the used clones will be a good bit larger
> (since they essentially contain all other branches as well). This can
> me mitigated by keeping non-release (feature) branches in separate
> clones. Also note that it's still possible to clone a single named
> branch from a combined clone, by specifying the branch as in hg clone
> http://hg.python.org/main/#2.6-maint. Keeping the py3k history in a
> separate clone problably also makes sense.

* Does it work with "hg pull" etc. too, afterwards?
* Does it support more than one branch?

> XXX To do: size comparison for selected separation scenarios.
> Converting branches
> 
> There are quite a lot of branches in SVN's branches directory. I
> propose to clean this up a bit, by employing the following the
> strategy:
> 
>     * Keep all release (maintenance) branches
>     * Discard branches that haven't been touched in 18 months, unless
> somone indicates there's still interest in such a branch
>     * Keep branches that have been touched in the last 18 months,
> unless someone indicates the branch can be deprecated

I would just kill all feature branches unless someone indicates it is
still used.  There are very few active feature branches.

(I guess in the case a branch gets killed erroneously it could still be
re-created after the conversion?)

> Converting tags
> 
> The SVN tags directory contains a lot of old stuff. Some of these are
> not, in fact, full tags, but contain only a smaller subset of the
> repository. I think we should keep all release tags, and consider
> other tags for inclusion based on requests from the developer
> community. I'd like to consider unifying the release tag naming scheme
> to make some things more consistent, if people feel that won't create
> too many problems. For example, Mercurial itself just uses '1.2.1' as
> a tag, where CPython would currently use r121.

+1 for readable tag names.
+1 for throwing out old questionable tag names.

> Generating .hgignore
> 
> The .hgignore file can be used in Mercurial repositories to help
> ignore files that are not eligible for version control. It does this
> by employing several possible forms of pattern matching. The current
> Python repository already includes a rudimentary .hgignore file to
> help with using the hg mirrors.
> 
> It might be useful to have the .hgignore be generated automatically
> from svn:ignore properties. This would make sure all historic
> revisions also have useful ignore information (though one could argue
> ignoring isn't really relevant to just checking out an old revision).

I guess that's not necessary.  People can just add stuff to .hgignore
when they see something that should be there.

> hg-ssh
> 
> Developers should access the repositories through ssh, similar to the
> current setup. Public keys can be used to grant people access to a
> shared hg@ account. A hgwebdir instance should also be set up for easy
> browsing and read-only access. If we're using ssh, developers should
> trivially be able to start new clones (for longer-term features that
> profit from a separate branch).

+1.

> Hooks
> 
> A number of hooks is currently in use. The hg equivalents for these
> should be developed and deployed. The following hooks are being used:
> 
>     * check whitespace: a hook to reject commits in case the
> whitespace doesn't match the rules for the Python codebase. Should be
> straightforward to re-implement from the current version. We can also
> offer a whitespace hook for use with client-side repositories that
> people can use; it could either warn about whitespace issues and/or
> truncate trailing whitespace from changed lines. Open issue: do we
> check only the tip after each push, or do we check every commit in a
> changegroup?

Only checking the tip would make it possible for people to revert their
whitespace commits, but then -- if they have the local hook -- they
shouldn't do that anyway.

>     * commit mails: we can leverage the notify extension for this

As long as it can send diffs...

>     * check contributors: in the current setup, all changesets bear
> the username of committers, who must have signed the contributor
> agreement. In a DVCS, the committers are not necessarily the same
> people who push, and so we can't check if the committer is a
> contributor. We could use a hook to check if the committer is a
> contributor if we keep a list of registered contributors.

That gets very ugly as soon as you start pulling from repos that just
fix a small typo or so.

> code.python.org was also proposed as the hostname. Personally, I think
> that using the VCS name in the hostname is good because it prevents
> confusion: it should be clear that you can't use svn or bzr for
> hg.python.org.

Yes, and it mirrors svn.python.org.

> Mercurial itself has a stable branch which is a ''strict'' subset of
> the unstable branch. This means that generally all fixes for the
> stable branch get committed against the tip of the stable branch, then
> they get merged into the unstable branch (which already contains the
> parent of the new cset). This provides a largely frictionless
> environment for moving changes from stable to unstable branches.
> Mistakes, where a change that should go on stable goes on unstable
> first, do happen, but they're usually easy to fix. That can be done by
> copying the change over to the stable branch, then trivial-merging
> with unstable -- meaning the merge in fact ignores the parent from the
> stable branch).
> 
> This strategy means a little more work for regular committers, because
> they have to think about whether their change should go on stable or
> unstable; they may even have to ask someone else (the RM) before
> committing. But it also relieves a dedicated group of committers of
> regular backporting duty, in addition to making it easier to work with
> the tool.

Strong +1 for that.

> I propose that the revision identifier will be the short version of
> hg's revision hash, for example 'dd3ebf81af43', augmented with '+'
> (instead of 'M') if the working directory from which it was built was
> modified. This mirrors the output of the hg id command, which is
> intended for this kind of usage.
> 
> For the tag/branch identifier, I propose that hg will check for tags
> on the currently checked out revision, use the tag if there is one
> ('tip' doesn't count), and uses the branch name otherwise.
> sys.subversion becomes
> 
>     * ('CPython', '2.6.2', 'dd3ebf81af43')
>     * ('CPython', 'default', 'af694c6a888c+')
> 
> and the build info string becomes
> 
>     * '2.6.2:dd3ebf81af43, Jun 2 2009, 09:58:33'
>     * 'default:af694c6a888c+, Jun 2 2009, 01:24:14'
>
> This reflects that the default branch in hg is called 'default'
> instead of Subversion's 'trunk', and reflects the proposed new tag
> format.

Looks good to me.

cheers,
Georg