[Python-checkins] r88590 - peps/trunk/pep-0385.txt

georg.brandl python-checkins at python.org
Fri Feb 25 19:30:02 CET 2011


Author: georg.brandl
Date: Fri Feb 25 19:30:02 2011
New Revision: 88590

Log:
Overhaul PEP 385 with newest conversion strategy.

Modified:
   peps/trunk/pep-0385.txt

Modified: peps/trunk/pep-0385.txt
==============================================================================
--- peps/trunk/pep-0385.txt	(original)
+++ peps/trunk/pep-0385.txt	Fri Feb 25 19:30:02 2011
@@ -1,8 +1,10 @@
 PEP: 385
-Title: Migrating from svn to Mercurial
+Title: Migrating from Subversion to Mercurial
 Version: $Revision$
 Last-Modified: $Date$
-Author: Dirkjan Ochtman <dirkjan at ochtman.nl>
+Author: Dirkjan Ochtman <dirkjan at ochtman.nl>,
+        Antoine Pitrou <solipsis at pitrou.net>,
+        Georg Brandl <georg at python.org>
 Status: Active
 Type: Process
 Content-Type: text/x-rst
@@ -20,12 +22,12 @@
 discussion.  It's somewhat similar to `PEP 347`_, which discussed the
 migration to SVN.
 
-To make the most of hg, I (Dirkjan) would like to make a high-fidelity
+To make the most of hg, we would like to make a high-fidelity
 conversion, such that (a) as much of the svn metadata as possible is
 retained, and (b) all metadata is converted to formats that are common
 in Mercurial.  This way, tools written for Mercurial can be optimally
-used.  In order to do this, I want to use the `hgsubversion`_ software
-to do an initial conversion.  This hg extension is focused on
+used.  In order to do this, we want to use the `hgsubversion`_
+software to do an initial conversion.  This hg extension is focused on
 providing high-quality conversion from Subversion to Mercurial for use
 in two-way correspondence, meaning it doesn't throw away as much
 available metadata as other solutions.
@@ -44,7 +46,7 @@
 
 The current schedule for conversion milestones:
 
-- 2010-11-20: availability of a test repo at hg.python.org
+- 2011-02-24: availability of a test repo at hg.python.org
 
   Test commits will be allowed (and encouraged) from all committers to
   the Subversion repository.  The test repository and all test commits
@@ -52,7 +54,7 @@
   hooks will be installed for the test repository, in order to test
   buildbot, diff-email and whitespace checking integration.
 
-- 2010-12-12: final conversion (tentative)
+- 2010-03-09: final conversion (tentative)
 
   Commits to the Subversion branches now maintained in Mercurial will
   be blocked.  Developers should refrain from pushing to the Mercurial
@@ -80,15 +82,8 @@
 where each revision keeps metadata to note on which branch it belongs.
 The former makes it easier to distinguish branches, at the expense of
 requiring more disk space on the client.  The latter makes it a little
-easier to switch between branches, but often has somewhat unintuitive
-results for people (though this has been getting better in recent
-versions of Mercurial).
-
-The current proposal is to use named branches for release branches and
-adopt cloned branches for feature branches, with one exception to this
-rule: the 3.x branches will be kept in separate clones from the 2.x
-branches.  I think this provides an optimal hybrid approach for
-Python's uses of branching.
+easier to switch between branches, but all branch names are a
+persistent part of history. [1]_
 
 Differences between named branches and cloned branches:
 
@@ -97,39 +92,70 @@
 * Clones with named branches will be larger, since they contain more
   data
 
-(The Mercurial book discourages the use of named branches, but it is,
-in this respect, somewhat outdated.  Named branches have gotten much
-easier to use since that comment was written, due to improvements in
-hg.)
-
-Converting branches
--------------------
-
-There are quite a lot of branches in SVN's branches directory.  I
-propose to clean this up a bit, by following this basic strategy:
-
-* Keep all release (maintenance) branches
-* Discard branches that haven't been touched in 18 months, unless
-  somone indicates there's still interest in such a branch
-* Keep branches that have been touched in the last 18 months, unless
-  someone indicates the branch can be deprecated
-
-There's a `branch map`_ available that shows info about each branch:
-
-* keep-clone means we'll keep that branch in a separate clone
-* keep-named means we'll keep that branch as a named branch in one of
-  the clones
-* strip means we won't keep that branch
-* streamed-merge means that it got merged by committing several new
-  revisions to the other branch
-* merged-r* means the branch got merged in the named revision
-* merges? means I haven't checked/found out yet whether that branch
-  was ever merged
-* ? means that your input would be even more helpful than for the
-  other items
-* some items have no action yet, feel free to treat that as just '?'
+We propose to use named branches for release branches and adopt cloned
+branches for feature branches.
+
+.. with one exception to this rule: the 3.x branches will be kept in
+.. separate clones from the 2.x branches.  I think this provides an
+.. optimal hybrid approach for Python's uses of branching.
+
+
+History management
+------------------
+
+In order to minimize the loss of information due to the conversion, we
+propose to provide several repositories as a conversion result:
+
+* A repository with the full, unedited conversion of the Subversion
+  repository (actually, its /python subdirectory) -- this is called
+  the "historic" or "archive" repo and will be offered as a read-only
+  resource. [2]_
+
+* A repository trimmed to the mainline trunk (and py3k), as well as
+  past and present maintenance branches -- this is called the
+  "working" repo and is where development continues.
+
+  The ``default`` branch in that repo is what is known as ``py3k`` in
+  Subversion, while the Subversion trunk lives on with the branch name
+  ``trunk``; however in Mercurial this branch will be closed.  Release
+  branches are named after their major.minor version, e.g. ``3.2``.
+
+* One more repository per active feature branch; "active" means that
+  at least one core developer asks for the branch to be provided.
+
+  All other branches are still present in the historic repo, and can
+  be extracted as separate repositories at any time should it prove to
+  be necessary.
+
+.. Converting branches
+.. -------------------
+
+.. There are quite a lot of branches in SVN's branches directory.  We
+.. propose to clean this up a bit, by following this basic strategy:
+
+.. * Keep all release (maintenance) branches
+.. * Discard branches that haven't been touched in 18 months, unless
+..   somone indicates there's still interest in such a branch
+.. * Keep branches that have been touched in the last 18 months, unless
+..   someone indicates the branch can be deprecated
+
+.. There's a `branch map`_ available that shows info about each branch:
+
+.. * keep-clone means we'll keep that branch in a separate clone
+.. * keep-named means we'll keep that branch as a named branch in one of
+..   the clones
+.. * strip means we won't keep that branch
+.. * streamed-merge means that it got merged by committing several new
+..   revisions to the other branch
+.. * merged-r* means the branch got merged in the named revision
+.. * merges? means we haven't checked/found out yet whether that branch
+..   was ever merged
+.. * ? means that your input would be even more helpful than for the
+..   other items
+.. * some items have no action yet, feel free to treat that as just '?'
+
+.. .. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt
 
-.. _branch map: http://hg.python.org/pymigr/file/tip/all-branches.txt
 
 Converting tags
 ---------------
@@ -137,18 +163,16 @@
 The SVN tags directory contains a lot of old stuff.  Some of these are
 not, in fact, full tags, but contain only a smaller subset of the
 repository.  All release tags will be kept; other tags will be
-included based on requests from the developer community.  I'd like to
-consider unifying the release tag naming scheme to make some things
-more consistent, if people feel that won't create too many problems.
-The current proposal is to bring old release tags in line with the
-current practice of release tag naming.
+included based on requests from the developer community.  We propose
+to make the tag naming scheme consistent, in this style: ``v3.2.1a2``.
+
 
 Author map
 ----------
 
 In order to provide user names the way they are common in hg (in the
 'First Last <user at example.org>' format), we need an author map to map
-cvs and svn user names to real names and their email addresses.  I
+cvs and svn user names to real names and their email addresses.  We
 have a complete version of such a map in my `migration tools
 repository`_.  The email addresses in it might be out of date; that's
 bound to happen, although it would be nice to try and have as many
@@ -157,6 +181,7 @@
 
 .. _migration tools repository: http://hg.python.org/pymigr/
 
+
 Generating .hgignore
 --------------------
 
@@ -172,23 +197,31 @@
 relatively hard with fairly little gain, since ignoring is less
 important for older revisions).
 
-Revlog reordering
------------------
 
-As an optional optimization technique, I have performed a reordering
-pass on the revlogs (internal Mercurial files) resulting from the
-conversion.  In some cases this results in dramatic decreases in
-on-disk repository size.  This especially makes sense for the manifest
-(where it really helps out quite a lot) and oft-edited files like
-Misc/NEWS (with an admittedly smaller effect).
+Repository size
+---------------
+
+A bare conversion result of the current Python repository weighs 1.9
+GB; although this is smaller than the Subversion repository (2.7 GB)
+it is not feasible.
+
+The size becomes more manageable by the trimming applied to the
+working repository, and by a process called "revlog reordering" that
+optimizes the layout of internal Mercurial storage very efficiently.
+
+After all optimizations done, the size of the working repository is
+around 180 MB on disk.  The amount of data transferred over the
+network when cloning is estimated to be around 80 MB.
+
 
 Other repositories
 ------------------
 
-Richard Tew has indicated that he'd like the Stackless repository to
-also be converted.  What other projects in the svn.python.org
-repository should be converted?  Do we want to convert the peps
-repository? distutils? others?
+There are a number of other projects hosted in svn.python.org's
+"projects" repository.  The "peps" directory will be converted along
+with the main Python one.  Richard Tew has indicated that he'd like the
+Stackless repository to also be converted.  What other projects in the
+svn.python.org repository should be converted?
 
 There's now an initial stab at converting the Jython repository.  The
 current tip of hgsubversion unfortunately fails at some point.
@@ -207,10 +240,11 @@
 
 Developers should access the repositories through ssh, similar to the
 current setup.  Public keys can be used to grant people access to a
-shared hg@ account.  A hgwebdir instance will also be set up for easy
-browsing and read-only access.  If we're using ssh, developers should
-trivially be able to start new clones (for longer-term features that
-profit from development in a separate repository).
+shared hg@ account.  A hgwebdir instance also has been set up at
+``hg.python.org`` for easy browsing and read-only access.  It is
+configured so that developers can trivially start new clones (for
+longer-term features that profit from development in a separate
+repository).
 
 Hooks
 -----
@@ -244,6 +278,7 @@
 
 .. _hooks repository: http://hg.python.org/hooks/
 
+
 End-of-line conversions
 -----------------------
 
@@ -259,6 +294,7 @@
 introducing inconsistent newline data can still be implemented, if
 deemed necessary.
 
+
 hgwebdir
 --------
 
@@ -274,6 +310,7 @@
 
 .. _small WSGI application: http://hg.python.org/pymigr/file/tip/hglookup.py
 
+
 roundup
 -------
 
@@ -291,47 +328,43 @@
 
 After migration, the hgwebdir will live at hg.python.org.  This is an
 accepted standard for many organizations, and an easy parallel to
-svn.python.org.  The 3.x repo might live at
-http://hg.python.org/main/, for example, with the 2.x repo at
-http://hg.python.org/2.x/.  For write access, developers will have to
-use ssh, which could be ssh://hg@hg.python.org/main/.  A demo
-installation will be set up with a preliminary conversion so people
-can experiment and review; it can live at
-http://hg.python.org/example/.
-
-code.python.org was also proposed as the hostname.  Personally, I
-think that using the VCS name in the hostname is good because it
-prevents confusion: it should be clear that you can't use svn or bzr
-for hg.python.org.
+svn.python.org.  The working repo might live at
+http://hg.python.org/cpython/, for example, with the archive repo at
+http://hg.python.org/cpython-archive/.  For write access, developers
+will have to use ssh, which could be ssh://hg@hg.python.org/cpython/.
+
+code.python.org was also proposed as the hostname.  We think that
+using the VCS name in the hostname is good because it prevents
+confusion: it should be clear that you can't use svn or bzr for
+hg.python.org.
 
-hgwebdir can already provide tarballs for every changeset.  I think
-this obviates the need for daily snapshots; we can just point users to
+hgwebdir can already provide tarballs for every changeset.  This
+obviates the need for daily snapshots; we can just point users to
 tip.tar.gz instead, meaning they will get the latest.  If desired, we
 could even use buildbot results to point to the last good changeset.
 
+
 Python-specific documentation
 -----------------------------
 
 hg comes with good built-in documentation (available through hg help)
-and a `wiki`_ that's full of useful information and recipes.  In
-addition to that, the `parts of the developer FAQ`_ concerning version
-control will gain a section on using hg for Python development.  Some
-of the text will be dependent on the outcome of debate about this PEP
-(for example, the branching strategy).
+and a `wiki`_ that's full of useful information and recipes.
+
+In addition to that, the recently overhauled `Python Developer's
+Guide`_ already has a branch with instructions for Mercurial instead
+of Subversion; an online `build of this branch`_ is also available.
 
-.. _wiki: http://www.selenic.com/mercurial/wiki/
-.. _parts of the developer FAQ: http://www.python.org/dev/faq/#version-control
+.. _Python Developer's Guide: http://docs.python.org/devguide/
+.. _build of this branch: http://potrou.net/hgdevguide/
 
-The developer FAQ will be overhauled by Brett Cannon, which will
-include any updates needed with respect to Mercurial.
 
 Proposed workflow
 -----------------
 
-I propose two workflows for the migration of patches between several
+We propose two workflows for the migration of patches between several
 branches.
 
-For migration within 2.x or 3.x branches, I propose a patch always
+For migration within 2.x or 3.x branches, we propose a patch always
 gets committed to the oldest branch where it applies first.  Then, the
 resulting changeset can be merged using hg merge to all newer branches
 within that series (2.x or 3.x).  If it does not apply as-is to the
@@ -358,6 +391,7 @@
 history-since-it-was-branched, meaning the clone is not as big and the
 merges not as complicated.
 
+
 The future of Subversion
 ------------------------
 
@@ -366,8 +400,9 @@
 CPython one, it will probably live on for a bit as not every project
 may want to migrate or it takes longer for other projects to migrate.
 To prevent people from staying behind, we may want to move migrated
-projects from the repository to a new, read-only repository with a
-new name.
+projects from the repository to a new, read-only repository with a new
+name.
+
 
 Build identification
 --------------------
@@ -410,6 +445,19 @@
 format.
 
 
+Footnotes
+=========
+
+.. [1] The Mercurial book discourages the use of named branches, but
+   it is, in this respect, somewhat outdated.  Named branches have
+   gotten much easier to use since that comment was written, due to
+   improvements in hg.
+
+.. [2] Since the initial working repo is a subset of the archive repo,
+   it would also be feasible to pull changes from the working repo
+   into the archive repo periodically.
+
+
 Copyright
 =========
 


More information about the Python-checkins mailing list