[Python-Dev] Mercurial migration: progress report (PEP 385)

Brett Cannon brett at python.org
Fri Jul 3 20:04:16 CEST 2009


On Thu, Jul 2, 2009 at 13:42, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:

> In response to some rumblings on python-committers and just to request
> more feedback, a progress report. I know it's long, I've tried to put
> to keep it concise and chunked, though.
>
> - First of all, I've got the basic conversion down, I've done it a few
> times now, with progressively better results. You can view some
> results at http://hg.python.org/, which has a preliminary cpython
> repository. *** The changeset hashes for that repo will change, so you
> won't be able to commit or pull from it in the future.***
>
> - Second of all, some planning. I've thought about it a bit, and I
> think we should aim for going live with hg on August 1. Given that I'm
> on vacation from 8-18 July (and I'm not sure whether I'll be able to
> actually work on it during that time, though I imagine I'll be able to
> spend some time on it at least), that's quite ambitious, so I'm going
> to say it's okay if it slips by a few days. Putting a deadline out
> there is a good thing, anyway.
>

Fine by me as long as people realize that if anything is questionable then
the switch will not happen. Getting this right takes precedence over any
deadline. And obviously we will need to do at least one live conversion on
python.org hardware to make sure everything will work smoothly.


>
> - Third of all, to make this possible, it would be helpful if I got
> more feedback on the PEP. Last time I raised it, there was virtually
> nothing. This time, I'll include it inline so there's hopefully less
> of a barrier to reviewing it.
>
> - Fourth, Mercurial 1.3 was just released! Bet you didn't see that
> coming. It's looking like a pretty good release, with an experimental
> version of the much-coveted subrepository support (like
> svn:externals). This also means that the latest version of
> hgsubversion, the tool I used for the conversion, will be more
> accessible for converting other projects. You know you want to!
>

And will make the idea of splitting out the standard library and tests a
reasonable thing to do.


>
> - Fifth, here's a list of things, off the top of my head, that still need
> doing:
>
> * Get agreement on branch strategy and branch processing (list of
> branches + proposed handling at
> http://hg.python.org/pymigr/file/tip/all-branches.txt) <--- PLEASE
> REVIEW
> * Get agreement on tag processing (first come up with a list)
> * Set up hg-ssh infra (should be easy)
> * Set up hooks (should be mostly straightforward)
> * Set up roundup integration (should be made easier by quick revision
> map hgweb extension)
> * Write docs
>
> - Sixth (this is the good part), less obvious things that have been
> done or don't need doing:
>
> * .hgignore generation (I've been convinced it's too hard, the current
> version will do)


Yeah, we can do this manually.


>
> * revlog reordering (it's painless and a big win)
>
> I'll get through all of these myself, but obviously any help would be
> welcome. For any hg users, writing docs should be an easy start. For
> others, please review the PEP (below), the branch map in
> http://hg.python.org/pymigr/file/tip/all-branches.txt and the author
> map at http://hg.python.org/pymigr/file/tip/author-map (not that much
> has changed since the start, so if you've looked at it already, feel
> free to skip this part). Right now I'm a little stuck on branch
> processing, because it's a long running script that needs a bunch of
> debugging, but I'll get going on that again.
>
> I think that's all I can think of for now, I'll update the PEP with
> new bits soon. Here it is, ready for your review:
>
> ==============================================================
>
> Motivation
>
> After having decided to switch to the Mercurial DVCS, the actual
> migration still has to be performed. In the case of an important piece
> of infrastructure like the version control system for a large,
> distributed project like Python, this is a significant effort. This
> PEP is an attempt to describe the steps that must be taken for further
> discussion. It's somewhat similar to PEP 347 [1], which discussed the
> migration to SVN.
>
> To make the most of hg, I (Dirkjan) would like to make a high-fidelity
> conversion, such that (a) as much of the svn metadata as possible is
> retained, and (b) all metadata is converted to formats that are common
> in Mercurial. This way, tools written for Mercurial can be optimally
> used. In order to do this, I want to use the hgsubversion [2] software
> to do an initial conversion. This hg extension is focused on providing
> high-quality conversion from Subversion to Mercurial for use in
> two-way correspondence, meaning it doesn't throw away as much
> available metadata as other solutions.
>
> Such a conversion also seems like a good time to reconsider the
> contents of the repository and determine if some things are still
> valuable. In this spirit, the following sections also propose
> discarding some of the older metadata.
> Timeline
>
> TBD; needs fully working hgsubversion and consensus on this document.
> Transition plan
> Branch strategy
>
> Mercurial has two basic ways of using branches: cloned branches, where
> each branch is kept in a separate repository, and named branches,
> where each revision keeps metadata to note on which branch it belongs.
> The former makes it easier to distinguish branches, at the expense of
> requiring more disk space on the client. The latter makes it a little
> easier to switch between branches, but often has somewhat unintuitive
> results for people (though this has been getting better in recent
> versions of Mercurial).
>
> I'm still a bit on the fence about whether Python should adopt cloned
> branches and named branches. Since it usually makes more sense to tag
> releases on the maintenance branch, for example, mainline history
> would not contain release tags if we used cloned branches. Also,
> Mercurial 1.2 and 1.3 have the necessary tools to make named branches
> less painful (because they can be properly closed and closed heads are
> no longer considered in relevant cases).
>
> A disadvantage might be that the used clones will be a good bit larger
> (since they essentially contain all other branches as well). This can
> me mitigated by keeping non-release (feature) branches in separate
> clones. Also note that it's still possible to clone a single named
> branch from a combined clone, by specifying the branch as in hg clone
> http://hg.python.org/main/#2.6-maint. Keeping the py3k history in a
> separate clone problably also makes sense.
>

While I really like the idea of using named branches for each release so
that there is a single py3k branch that contains all relevant history for
every release, I think we should start simple and go with cloned branches.
That way the workflow does not radically shift from what we do now for svn
to start. Once the conversion is done and people are comfortable with hg we
can then discuss moving towards a named branch approach.


>
> XXX To do: size comparison for selected separation scenarios.
> Converting branches
>
> There are quite a lot of branches in SVN's branches directory. I
> propose to clean this up a bit, by employing the following the
> strategy:
>
>    * Keep all release (maintenance) branches
>    * Discard branches that haven't been touched in 18 months, unless
> somone indicates there's still interest in such a branch
>    * Keep branches that have been touched in the last 18 months,
> unless someone indicates the branch can be deprecated
>

Sounds reasonable to me. We can just make a list and send it to
python-committers to make final decisions of what should stick around.


>
> Converting tags
>
> The SVN tags directory contains a lot of old stuff. Some of these are
> not, in fact, full tags, but contain only a smaller subset of the
> repository. I think we should keep all release tags, and consider
> other tags for inclusion based on requests from the developer
> community. I'd like to consider unifying the release tag naming scheme
> to make some things more consistent, if people feel that won't create
> too many problems. For example, Mercurial itself just uses '1.2.1' as
> a tag, where CPython would currently use r121.


I don't use tags so I don't really care, but in the name of easy transition
I say we don't change the naming scheme (although I have no issue dropping
obscure tags).


>
> Author map
>
> In order to provide user names the way they are common in hg (in the
> 'First Last <user at example.org>' format), we need an author map to map
> cvs and svn user names to real names and their email addresses. I have
> a complete version of such a map in my migration tools repository [3].
> The email addresses in it might be out of date; that's bound to
> happen, although it would be nice to try and have as many people as
> possible review it for addresses that are out of date. The current
> version also still seems to contain some encoding problems.


Something else that can go out to python-committers before the switch.


>
> Generating .hgignore
>
> The .hgignore file can be used in Mercurial repositories to help
> ignore files that are not eligible for version control. It does this
> by employing several possible forms of pattern matching. The current
> Python repository already includes a rudimentary .hgignore file to
> help with using the hg mirrors.
>
> It might be useful to have the .hgignore be generated automatically
> from svn:ignore properties. This would make sure all historic
> revisions also have useful ignore information (though one could argue
> ignoring isn't really relevant to just checking out an old revision).


Don't bother with anything automatic. We can change the .hgignore file by
hand. We all know glob and regex syntax. =)


>
> Revlog reordering
>
> As an optional optimization technique, we should consider trying a
> reordering pass on the revlogs (internal Mercurial files) resulting
> from the conversion. In some cases this results in dramatic decreases
> in on-disk repository size.


Fine by me.


>
> Other repositories
>
> Richard Tew has indicated that he'd like the Stackless repository to
> also be converted. What other projects in the svn.python.org
> repository should be converted? Do we want to convert the peps
> repository? distutils? others?


I don't think there is a single project we host -- all two of them -- that
have not said they want to convert. So I say convert everything and let's
turn off the svn server by the end of the year.


>
> Infrastructure
> hg-ssh
>
> Developers should access the repositories through ssh, similar to the
> current setup. Public keys can be used to grant people access to a
> shared hg@ account. A hgwebdir instance should also be set up for easy
> browsing and read-only access. If we're using ssh, developers should
> trivially be able to start new clones (for longer-term features that
> profit from a separate branch).
> Hooks
>
> A number of hooks is currently in use. The hg equivalents for these
> should be developed and deployed. The following hooks are being used:
>
>    * check whitespace: a hook to reject commits in case the
> whitespace doesn't match the rules for the Python codebase. Should be
> straightforward to re-implement from the current version. We can also
> offer a whitespace hook for use with client-side repositories that
> people can use; it could either warn about whitespace issues and/or
> truncate trailing whitespace from changed lines. Open issue: do we
> check only the tip after each push, or do we check every commit in a
> changegroup?
>    * commit mails: we can leverage the notify extension for this
>    * buildbots: both the regular and the community build masters must
> be notified. Fortunately buildbot includes support for hg. I've also
> implemented this for Mercurial itself, so I don't expect problems
> here.
>    * check contributors: in the current setup, all changesets bear
> the username of committers, who must have signed the contributor
> agreement. In a DVCS, the committers are not necessarily the same
> people who push, and so we can't check if the committer is a
> contributor. We could use a hook to check if the committer is a
> contributor if we keep a list of registered contributors.


Can we check these scripts into the repository itself? That way there is a
chance of reuse as hg commands, e.g. ``hg pydev-ci`` as a replacement for
``make patchcheck``.


>
>
> hgwebdir
>
> A more or less stock hgwebdir installation should be set up. We might
> want to come up with a style to match the Python website. It may also
> be useful to build a quick extension to augment the URL rev parser so
> that it can also take r[0-9]+ args and come up with the matching hg
> revision.
> After migration
> Where to get code
>
> It needs to be decided where the hg repositories will live. I'd like
> to propose to keep the hgwebdir instance at hg.python.org. This is an
> accepted standard for many organizations, and an easy parallel to
> svn.python.org. The 2.7 (trunk) repo might live at
> http://hg.python.org/main/, for example, with py3k at
> http://hg.python.org/py3k/. For write access, developers will have to
> use ssh, which could be ssh://hg@hg.python.org/main/. A demo
> installation will be set up with a preliminary conversion so people
> can experiment and review; it can live at
> http://hg.python.org/example/.
>
> code.python.org was also proposed as the hostname. Personally, I think
> that using the VCS name in the hostname is good because it prevents
> confusion: it should be clear that you can't use svn or bzr for
> hg.python.org.
>

How about hg.python.org for the official branches and we keep
code.python.org for personal branches of the developers like we have done
with the bzr experiments?


>
> hgwebdir can already provide tarballs for every changeset. I think
> this obviates the need for daily snapshots; we can just point users to
> tip.tar.gz instead, meaning they will get the latest. If desired, we
> could even use buildbot results to point to the last good changeset.


I like the stable buildbot tarball idea.


>
> Python-specific documentation
>
> hg comes with good built-in documentation (available through hg help)
> and a wiki [4] that's full of useful information and recipes. In
> addition to that, the parts of the developer FAQ [5] concerning
> version control will gain a section on using hg for Python
> development. Some of the text will be dependent on the outcome of
> debate about this PEP (for example, the branching strategy).
>
> Think first, commit later?
>
> In recent history, old versions of Python have been maintained by a
> select group of people backporting patches from trunk to release
> branches. While this may not scale so well as the development pace
> grows, it also runs into some problems with the current crop of
> distributed versioning tools. These tools (I believe similar problems
> would exist for either git, bzr, or hg, though some may cope better
> than others) are based on the idea of a Directed Acyclic Graph (or
> DAG), meaning they keep track of relations of changesets.
>
> Mercurial itself has a stable branch which is a ''strict'' subset of
> the unstable branch. This means that generally all fixes for the
> stable branch get committed against the tip of the stable branch, then
> they get merged into the unstable branch (which already contains the
> parent of the new cset). This provides a largely frictionless
> environment for moving changes from stable to unstable branches.
> Mistakes, where a change that should go on stable goes on unstable
> first, do happen, but they're usually easy to fix. That can be done by
> copying the change over to the stable branch, then trivial-merging
> with unstable -- meaning the merge in fact ignores the parent from the
> stable branch).
>
> This strategy means a little more work for regular committers, because
> they have to think about whether their change should go on stable or
> unstable; they may even have to ask someone else (the RM) before
> committing. But it also relieves a dedicated group of committers of
> regular backporting duty, in addition to making it easier to work with
> the tool.
>
> Now would be a good time to consider changing strategies in this
> regard, although it would be relatively easy to switch to such a model
> later on.


As I have said, we should change our workflow habits after the switch and
people are comfortable with hg.


>
> The future of Subversion
>
> What happens to the Subversion repositories after the migration? Since
> the svn server contains a bunch of repositories, not just the CPython
> one, it will probably live on for a bit as not every project may want
> to migrate or it takes longer for other projects to migrate. To
> prevent people from staying behind, we may want to remove migrated
> projects from the repository.
> Build identification
>
> Python currently provides the sys.subversion tuple to allow Python
> code to find out exactly what version of Python it's running against.
> The current version looks something like this:
>
>    * ('CPython', 'tags/r262', '71600')
>    * ('CPython', 'trunk', '73128M')
>
> Another value is returned from Py_GetBuildInfo() in the C API, and
> available to Python code as part of sys.version:
>
>    * 'r262:71600, Jun 2 2009, 09:58:33'
>    * 'trunk:73128M, Jun 2 2009, 01:24:14'
>
> I propose that the revision identifier will be the short version of
> hg's revision hash, for example 'dd3ebf81af43', augmented with '+'
> (instead of 'M') if the working directory from which it was built was
> modified. This mirrors the output of the hg id command, which is
> intended for this kind of usage.
>
> For the tag/branch identifier, I propose that hg will check for tags
> on the currently checked out revision, use the tag if there is one
> ('tip' doesn't count), and uses the branch name otherwise.
> sys.subversion becomes
>
>    * ('CPython', '2.6.2', 'dd3ebf81af43')
>    * ('CPython', 'default', 'af694c6a888c+')
>
> and the build info string becomes
>
>    * '2.6.2:dd3ebf81af43, Jun 2 2009, 09:58:33'
>    * 'default:af694c6a888c+, Jun 2 2009, 01:24:14'
>
> This reflects that the default branch in hg is called 'default'
> instead of Subversion's 'trunk', and reflects the proposed new tag
> format.


Should we consider adding a sys.revision attribute and begin the deprecation
of sys.subversion?


>
> References
> [1]     http://www.python.org/dev/peps/pep-0347/
> [2]     http://bitbucket.org/durin42/hgsubversion/
> [3]     http://hg.xavamedia.nl/cpython/pymigr/
> [4]     http://www.selenic.com/mercurial/wiki/
> [5]     http://www.python.org/dev/faq/#version-control
>
> =====================================================
>
> Cheers,
>
> Dirkjan
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090703/90bca0ab/attachment-0001.htm>


More information about the Python-Dev mailing list