Time to decide how to convert hg repos to git
Now that I'm comfortable declaring the code for the CLA bot finished ( https://github.com/python/the-knights-who-say-ni), the next step is to finalize the command(s) we are going to use to convert hg repositories to git for migration to GitHub. Senthil, are you ready to make a final decision? Once I have the conversion command(s) documented in PEP 512 and I have created the "Python core" team on GitHub for all of the current Python core developers, I will migrate https://hg.python.org/devinabox to make sure that everything works. After that has been verified as working I will then look at migrating https://hg.python.org/devguide/ and https://hg.python.org/peps/ (will take a little bit more effort for both to get the web builds updated, and peps requires getting the PEP editors on-board). The benchmarks repo might actually not get migrated as there is talk of starting that repo from scratch. It's looking like we will stay on track to get at least one repository migrated by PyCon US!
I had initially suggested hg-git, but after my (mostly negative) experience in trying to convert the CPython repo, my vote goes towards fast-export. Partly because it's the only once *specifically* designed for exporting, not trying to be a bridge like hg-git and the other tool. -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ On Apr 22, 2016 6:27 PM, "Brett Cannon" <brett@python.org> wrote:
Now that I'm comfortable declaring the code for the CLA bot finished ( https://github.com/python/the-knights-who-say-ni), the next step is to finalize the command(s) we are going to use to convert hg repositories to git for migration to GitHub. Senthil, are you ready to make a final decision?
Once I have the conversion command(s) documented in PEP 512 and I have created the "Python core" team on GitHub for all of the current Python core developers, I will migrate https://hg.python.org/devinabox to make sure that everything works. After that has been verified as working I will then look at migrating https://hg.python.org/devguide/ and https://hg.python.org/peps/ (will take a little bit more effort for both to get the web builds updated, and peps requires getting the PEP editors on-board). The benchmarks repo might actually not get migrated as there is talk of starting that repo from scratch.
It's looking like we will stay on track to get at least one repository migrated by PyCon US!
_______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
Hi Brett, On Fri, Apr 22, 2016 at 4:27 PM, Brett Cannon <brett@python.org> wrote:
Now that I'm comfortable declaring the code for the CLA bot finished ( https://github.com/python/the-knights-who-say-ni), the next step is to finalize the command(s) we are going to use to convert hg repositories to git for migration to GitHub. Senthil, are you ready to make a final decision?
Yes. I did experiments while collating points from others with multiple tools and I found that hg-git could be suitable for our export. It won't be a single command due to the size of our repo. It will be a multi-step process. 1. Initialize local a bare git repo 2. Use hg-git to migrate from local hg to local git repo. (Ensure the migration was complete. With branches as desired). 3. And finally, push the local git repo, 1000s of commits at a time to remote repo. I had tried it myself and was successful with this approach. https://github.com/orsenthil/cpython-hg-to-git The other tools and approaches I tried had failed for me. Here's my plan and a to do: 1. Even though it is a one-time operation, I plan to convert above steps into a trivial tool that we can use and verify independently. 2. Once we are satisfied with our local trials, you could use this tool once to convert the hg repo and push to canonical git repo. todo * I haven't talked to the maintainer of https://github.com/python/cpython repo yet. I should do that and see what differences will be between the semi-official mirror and repo created by the importing tool. It will help us make a more informed decision. I will follow up an update on this and the tool. Thanks, Senthil
On Fri, Apr 22, 2016, 18:45 Senthil Kumaran <senthil@uthcode.com> wrote:
Hi Brett,
On Fri, Apr 22, 2016 at 4:27 PM, Brett Cannon <brett@python.org> wrote:
Now that I'm comfortable declaring the code for the CLA bot finished ( https://github.com/python/the-knights-who-say-ni), the next step is to finalize the command(s) we are going to use to convert hg repositories to git for migration to GitHub. Senthil, are you ready to make a final decision?
Yes. I did experiments while collating points from others with multiple tools and I found that hg-git could be suitable for our export. It won't be a single command due to the size of our repo. It will be a multi-step process.
1. Initialize local a bare git repo 2. Use hg-git to migrate from local hg to local git repo. (Ensure the migration was complete. With branches as desired). 3. And finally, push the local git repo, 1000s of commits at a time to remote repo.
I had tried it myself and was successful with this approach. https://github.com/orsenthil/cpython-hg-to-git
The other tools and approaches I tried had failed for me.
Here's my plan and a to do:
1. Even though it is a one-time operation, I plan to convert above steps into a trivial tool that we can use and verify independently. 2. Once we are satisfied with our local trials, you could use this tool once to convert the hg repo and push to canonical git repo.
todo
* I haven't talked to the maintainer of https://github.com/python/cpython repo yet. I should do that and see what differences will be between the semi-official mirror and repo created by the importing tool. It will help us make a more informed decision.
I will follow up an update on this and the tool.
Great! Obviously the tool is more important as that is a blocker for devinabox while talking to Eli Bendersky about the mirror isn't. -Brett
Thanks, Senthil
2016-04-22 22:45 GMT-03:00 Senthil Kumaran <senthil@uthcode.com>:
Hi Brett,
On Fri, Apr 22, 2016 at 4:27 PM, Brett Cannon <brett@python.org> wrote:
Now that I'm comfortable declaring the code for the CLA bot finished (https://github.com/python/the-knights-who-say-ni), the next step is to finalize the command(s) we are going to use to convert hg repositories to git for migration to GitHub. Senthil, are you ready to make a final decision?
Yes. I did experiments while collating points from others with multiple tools and I found that hg-git could be suitable for our export. It won't be a single command due to the size of our repo. It will be a multi-step process.
1. Initialize local a bare git repo 2. Use hg-git to migrate from local hg to local git repo. (Ensure the migration was complete. With branches as desired). 3. And finally, push the local git repo, 1000s of commits at a time to remote repo.
Note that git-fast-import produces extremely suboptimal git packs. You should run "git gc --aggressive" after converting. In KDE I have seen post-svn2git repos shrink by two or three orders of magnitude by repacking them with git-gc. Maybe if you do that, there is no need to push in batches of 1000 commits. -- Nicolás
Hi, Are you aware the work ESR did for liberating source code repositories and moving them between version control systems? It also supports editing version-control repository history. http://www.catb.org/esr/reposurgeon/ He used it to port the Emacs repository to git, not an easy feat, and blogged about it: http://esr.ibiblio.org/?p=5634 and http://esr.ibiblio.org/?p=5211 Greetings,
On Sat, Apr 23, 2016 at 8:40 AM, Michiel Overtoom <motoom@xs4all.nl> wrote:
Hi,
Are you aware the work ESR did for liberating source code repositories and moving them between version control systems? It also supports editing version-control repository history.
http://www.catb.org/esr/reposurgeon/
He used it to port the Emacs repository to git, not an easy feat, and blogged about it: http://esr.ibiblio.org/?p=5634 and http://esr.ibiblio.org/?p=5211
There was talk about this on python-committers (and on this list, I think) but there seem to be more than enough other tools that can do the job and aren't associated with ESR.
Hello Core-Workflow Group, On Fri, Apr 22, 2016 at 6:45 PM, Senthil Kumaran wrote:
Here's my plan and a to do:
1. Even though it is a one-time operation, I plan to convert above steps into a trivial tool that we can use and verify independently. 2. Once we are satisfied with our local trials, you could use this tool once to convert the hg repo and push to canonical git repo.
This was the tool I mentioned in the above point. https://github.com/orsenthil/cpython-hg-to-git I used this to test migration of small hg repos to github repos and operations were successful. As a test, I could migrate Cpython repo, but it took multiple hours on a fast machine. I assume it is due to python and subprocess overhead. The faster way is to just use hg-git extension directly follow steps documented in the orsenthil/cpython-hg-to-git repo for the sake of consistency. Brett and I discussed that we might need a way to verify if two repos, hg and git repos are same (that's have the same graph) as we undertake this process. I don't know any offhand comparison commands, but I assume it should be possible. I plan to add that to that tool. Please share your comments. Thank you, Senthil
On 8 May 2016 at 08:08, Senthil Kumaran <senthil@uthcode.com> wrote:
Brett and I discussed that we might need a way to verify if two repos, hg and git repos are same (that's have the same graph) as we undertake this process. I don't know any offhand comparison commands, but I assume it should be possible. I plan to add that to that tool.
One starting point that comes to mind is to compare the number of revisions (including all merges and merged revisions) for each branch, tag, etc. With Git you can do it like: $ git rev-list --count master 489 I don’t know what the equivalent command in Mercurial is. Perhaps you could clone the relevant branch to a fresh repository and check the numerical revision number.
On Sun, 8 May 2016 at 01:48 Martin Panter <vadmium+py@gmail.com> wrote:
On 8 May 2016 at 08:08, Senthil Kumaran <senthil@uthcode.com> wrote:
Brett and I discussed that we might need a way to verify if two repos, hg and git repos are same (that's have the same graph) as we undertake this process. I don't know any offhand comparison commands, but I assume it should be possible. I plan to add that to that tool.
One starting point that comes to mind is to compare the number of revisions (including all merges and merged revisions) for each branch, tag, etc. With Git you can do it like:
$ git rev-list --count master 489
I don’t know what the equivalent command in Mercurial is. Perhaps you could clone the relevant branch to a fresh repository and check the numerical revision number.
SO to the rescue (and Martin is right about how to figure it out): http://stackoverflow.com/questions/16672788/total-count-of-change-sets-for-m... Senthil has also suggested verifying the hashes of all the files in a repository that are not in .hg or .git directories. The reason this is important for us to all figure out is that if the current unofficial mirrors for peps and cpython pass verification then we can skip the conversion steps and simply make the unofficial mirrors the official repositories (on top of making sure any conversion succeeds).
Hi Martin, Brett: On Sun, May 8, 2016 at 10:38 AM, Brett Cannon <brett@python.org> wrote:
$ git rev-list --count master
489
I don’t know what the equivalent command in Mercurial is. Perhaps you could clone the relevant branch to a fresh repository and check the numerical revision number.
SO to the rescue (and Martin is right about how to figure it out): http://stackoverflow.com/questions/16672788/total-count-of-change-sets-for-m...
Senthil has also suggested verifying the hashes of all the files in a repository that are not in .hg or .git directories.
Are these validations enough for our purposes? Two files in the different version-control system can have same SHA and same commit of commits, but have a possibility of changesets/diffs associated with those commits different. I was thinking, how we should go about with this when evaluating the existing git repo. When we do migration afresh using a tool like hg-git, we assume that this verification step is asserted as part of unit tests of the tool. -- Senthil
On Sun, 8 May 2016 at 16:33 Senthil Kumaran <senthil@uthcode.com> wrote:
Hi Martin, Brett:
On Sun, May 8, 2016 at 10:38 AM, Brett Cannon <brett@python.org> wrote:
$ git rev-list --count master
489
I don’t know what the equivalent command in Mercurial is. Perhaps you could clone the relevant branch to a fresh repository and check the numerical revision number.
SO to the rescue (and Martin is right about how to figure it out): http://stackoverflow.com/questions/16672788/total-count-of-change-sets-for-m...
Senthil has also suggested verifying the hashes of all the files in a repository that are not in .hg or .git directories.
Are these validations enough for our purposes?
Don't know, but it's at least a start.
Two files in the different version-control system can have same SHA and same commit of commits, but have a possibility of changesets/diffs associated with those commits different. I was thinking, how we should go about with this when evaluating the existing git repo.
When we do migration afresh using a tool like hg-git, we assume that this verification step is asserted as part of unit tests of the tool.
That would be my hope. It obviously doesn't hurt to check, though, if it isn't too difficult. -Brett
-- Senthil
On 8 May 2016 at 22:38, Senthil Kumaran <senthil@uthcode.com> wrote:
Hi Martin, Brett:
On Sun, May 8, 2016 at 10:38 AM, Brett Cannon <brett@python.org> wrote:
$ git rev-list --count master 489
I don’t know what the equivalent command in Mercurial is. Perhaps you could clone the relevant branch to a fresh repository and check the numerical revision number.
SO to the rescue (and Martin is right about how to figure it out): http://stackoverflow.com/questions/16672788/total-count-of-change-sets-for-m...
Senthil has also suggested verifying the hashes of all the files in a repository that are not in .hg or .git directories.
Are these validations enough for our purposes?
Two files in the different version-control system can have same SHA and same commit of commits, but have a possibility of changesets/diffs associated with those commits different. I was thinking, how we should go about with this when evaluating the existing git repo.
In my experience, mainly with converting Subversion → Git, there are sometimes subtle variations that mean two different tools end up with slightly different repositories (different commit hashes). Some of these we might like to watch out for; others, maybe we don’t care. Brainstorm off the top of my head: * Trivial commits that don’t touch any files may or may not be removed from history * Messages with non-ASCII bytes (UTF-8, nor non-UTF-8) * User names. Git requires separate, non-empty name and email fields (xxx <yyy>) but I don’t think Mercurial is so strict. * Trailing newlines and whitespace in commit messages. Different utilities have different rules about how they strip trailing newlines, e.g. they may leave exactly one, none, or the original. * Sub-second timestamps and time zones?
participants (7)
-
Brett Cannon
-
Brian Curtin
-
Martin Panter
-
Michiel Overtoom
-
Nicolás Alvarez
-
Ryan Gonzalez
-
Senthil Kumaran