[core-workflow] Time to decide how to convert hg repos to git

Martin Panter vadmium+py at gmail.com
Sun May 8 20:45:43 EDT 2016


On 8 May 2016 at 22:38, Senthil Kumaran <senthil at uthcode.com> wrote:
> Hi Martin, Brett:
>
> On Sun, May 8, 2016 at 10:38 AM, Brett Cannon <brett at python.org> wrote:
>>>
>>> $ git rev-list --count master
>>> 489
>>>
>>> I don’t know what the equivalent command in Mercurial is. Perhaps you
>>> could clone the relevant branch to a fresh repository and check the
>>> numerical revision number.
>>
>>
>> SO to the rescue (and Martin is right about how to figure it out):
>> http://stackoverflow.com/questions/16672788/total-count-of-change-sets-for-mercurial-and-git
>>
>> Senthil has also suggested verifying the hashes of all the files in a
>> repository that are not in .hg or .git directories.
>
>
> Are these validations enough for our purposes?
>
> Two files in the different version-control system can have same SHA and same
> commit of commits, but have a possibility of changesets/diffs associated
> with those commits different. I was thinking, how we should go about with
> this when evaluating the existing git repo.

In my experience, mainly with converting Subversion → Git, there are
sometimes subtle variations that mean two different tools end up with
slightly different repositories (different commit hashes). Some of
these we might like to watch out for; others, maybe we don’t care.
Brainstorm off the top of my head:

* Trivial commits that don’t touch any files may or may not be removed
from history
* Messages with non-ASCII bytes (UTF-8, nor non-UTF-8)
* User names. Git requires separate, non-empty name and email fields
(xxx <yyy>) but I don’t think Mercurial is so strict.
* Trailing newlines and whitespace in commit messages. Different
utilities have different rules about how they strip trailing newlines,
e.g. they may leave exactly one, none, or the original.
* Sub-second timestamps and time zones?


More information about the core-workflow mailing list