[Python-Dev] SVN <-> HG workflow to split Python Library by Module
anatoly techtonik
techtonik at gmail.com
Fri Jul 2 21:25:08 CEST 2010
I planned to publish this proposal when it is finally ready and tested
with an assumption that Subversion repository will be online and
up-to-date after Mercurial migration. But recent threads showed that
currently there is no tested mechanism to sync Subversion repository
back with Mercurial, so it will probably quickly outdate, and the
proposal won't have a chance to be evaluated. So now is better than
never.
So, this is a way to split modules from monolithic Subversion
repository into several Mercurial mirrors - one mirror for each module
(or whatever directory structure you like). This will allow to
concentrate your work on only one module at a time ("distutils",
"CGIHTTPServer" etc.) without caring much about anything else.
Exceptionally useful for occasional external "contributors" like me,
and folks on Windows, who don't possess Visual Studio to compile
Python and are forced to use whatever version they have installed to
create and test patches.
Here is a picture if you feel bored -
https://docs.google.com/drawings/edit?id=1c9FDQ27BnaIew_1T7Tr-rFg1OCdPVS9w3TQREOkzyjk&hl=en
An example of the split distutils module -
http://bitbucket.org/techtonik/distutils
The split is not perfect, but the process can be polished - it is the
first version I managed to get only this morning. More important is
that HG repository is incrementally synchronized. The split is not
perfect, because in particular I see that documentation dir is not
sucked in. But it is a working proof on concept you can test yourself
using the code from:
http://bitbucket.org/techtonik/python-split
You will also need patched version of `hgsvn` from
http://bitbucket.org/techtonik/hgsvn
How does it work
-------------------------
The module is described as a series of paths inside typical Subversion checkout.
On the first run `refresh.py` script from `python-split` creates
shallow SVN checkout with only required files using
distutils.module.def module definition
Second run of `refresh.py` imports shallow checkout into Mercurial
And the third run imports the rest of the history pulling only
changesets relevant to given paths.
Workflow
-------------
Diagram showed patches that are pulled from local clones of split
repositories to master Mercurial mirror, but it won't work this way,
because hashes of revisions in direct mirror wont't match hashes in
split repositories - that's why some hash lookup/sync procedure is
needed to correctly process incoming patches. This workflow works with
hash sync only when changes are pushed back to central Subversion
repository from local clones (possibly through another intermediate
normalizing repository). Changes pushed this way are streamlined and
could be downloaded into stable branch of other mirrors as a single
line of development. I borrowed streamlining concept from Go
contribution guide as it really helps to review chaotic Mercurial
commits. http://golang.org/doc/contribute.html#Code_review
Maintaining centralized Subversion repository will require additional
properties to be set, but this is doable. I don't how to make module
split with Mercurial alone.
http://mercurial.selenic.com/wiki/ShallowClone is still a draft (and
complicated one) and Mercurial 1.6 that released today doesn't contain
anything revolutionary to propose an alternative.
I am exhausted.
--
anatoly t.
More information about the Python-Dev
mailing list