[core-workflow] Help needed: best way to convert hg repos to git?

Brett Cannon brett at python.org
Fri Feb 12 20:15:55 EST 2016


I don't remember the story behind cpython-fullhistory, but it's obviously
incomplete since just stopped post conversion. You will need to find
someone who knows (I'd ask on python-dev).

Also realize that this will be  our fourth VCS (cvs, svn, hg, now git).
This is not going to be a perfect history of the semantic actions of
commits from the beginning of time just due to the fact that these VCS
tools all use different concepts.

On Thu, Feb 11, 2016, 22:04 Martin Panter <vadmium+py at gmail.com> wrote:

> On 12 February 2016 at 03:07, Brett Cannon <brett at python.org> wrote:
> > On Thu, Feb 11, 2016, 16:43 Nicolás Alvarez <nicolas.alvarez at gmail.com>
> > wrote:
> >> I tried fast-export, and I don't really see anything wrong with the
> >> repository. The size is 221MB.
>
> One thing I’m slightly curious about is how much the result differs
> from <https://github.com/python/cpython> or other results, and if so,
> what the differences are. The differences could be serious (mangled
> history), or they could be trivial things like stripping trailing
> newlines from commit messages, or skipping commits that don’t change
> any files.
>
> >> It depends on how crazy you want to go. For example, SVN-era merges
> >> don't appear as merges, but looks like some SVN-era branches don't
> >> exist in Hg to begin with (Would I need to get cpython-fullhistory?
> >> Cloning it gives me a 400 Bad Request). Do we care about that?
> >
> > Good question. If you are not an even clone it then that shows how much
> > people who are. Honestly I wouldn't worry since we have the history in
> the
> > hg repo (converting from svn was necessary to have it available without
> the
> > server).
>
> I care a bit. If I get the time, I would like to figure out a robust
> way to convert the Subversion history to Git so that the svnmerge
> information is included as proper merges.
>
> Another concern for me is that some of the useful history is not even
> in Mercurial. For example <https://hg.python.org/lookup/r70152> is an
> svnmerge from ^/python/branches/io-c into ^/python/branches/py3k, but
> the Mercurial repository doesn’t have the branch history, so all the
> merged-in Subversion revisions such as r68683 are missing.
>
> Some other highlights on my quest to investigate the holy Subversion
> respository (I can post my full notes somewhere if ppl are
> interested):
>
> * It is nice to have a local mirror of the Subversion repository so
> that experimenting with different options and programs isn’t horribly
> slow. But I don’t want to mirror everything or overload the server
> because there are other projects stored in the repository that seem to
> take up a lot of space (and download time).
>
> * What is the story with the cpython-fullhistory Mercurial repository?
> On the surface it almost looks like an out-of-date copy of the main
> repository, but I notice some subtle differences, e.g. revision ids
> for early tags are different, v1.0.0 tag is added.
>
> * Some Subversion revisions actually merge stuff from outside the
> Python tree (e.g. <https://hg.python.org/lookup/r88662> from
> ^/sandbox/trunk/2to3/lib2to3 into
> ^/branches/release27-maint/Lib/lib2to3. Not sure if it is worth trying
> to salvage these merges; I never noticed them when working on Python.
>
> >> Or, changes that come from non-committers could have their Author
> >> field modified, maybe based on the ACKS file modification. It's
> >> feasible but will take time and manual work. Do we care about that?
> >
> > That would be great but too much effort.
>
> I think it would not be worth it, and could even be detrimenal. You
> would be trying to guess based on incomplete and unreliable
> information. Maybe one person wrote a test, another wrote the
> implementation, and a third wrote the documentation, but it was all
> committed at once. Maybe the author was already in ACKS and the
> committer did not mention who the author was in the message. I think
> it is safer to not pretend the author field is alway accurate.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/core-workflow/attachments/20160213/fc5c9fad/attachment-0001.html>


More information about the core-workflow mailing list