Help needed: best way to convert hg repos to git?
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria... There appear to be multiple ways to convert hg repos to git, but no clear winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
2016-02-05 22:57 GMT-03:00 Brett Cannon <brett@python.org>:
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria...
There appear to be multiple ways to convert hg repos to git, but no clear winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
I said I'd look into this. I didn't. Shame on me. Trying fast-export now :) -- Nicolás
2016-02-05 23:39 GMT-03:00 Nicolás Alvarez <nicolas.alvarez@gmail.com>:
2016-02-05 22:57 GMT-03:00 Brett Cannon <brett@python.org>:
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria...
There appear to be multiple ways to convert hg repos to git, but no clear winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
I said I'd look into this. I didn't. Shame on me.
Trying fast-export now :)
Update: The fast-export tool started at about 500 revs/sec but progressively slowed down. Now it's 90% done after churning for two hours, and each merge commit (of which there are many!) takes an entire second by itself. I don't feel like staying awake to see it finish. -- Nicolás
2016-02-06 3:03 GMT-03:00 Nicolás Alvarez <nicolas.alvarez@gmail.com>:
2016-02-05 23:39 GMT-03:00 Nicolás Alvarez <nicolas.alvarez@gmail.com>:
2016-02-05 22:57 GMT-03:00 Brett Cannon <brett@python.org>:
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria...
There appear to be multiple ways to convert hg repos to git, but no clear winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
I said I'd look into this. I didn't. Shame on me.
Trying fast-export now :)
Update: The fast-export tool started at about 500 revs/sec but progressively slowed down. Now it's 90% done after churning for two hours, and each merge commit (of which there are many!) takes an entire second by itself. I don't feel like staying awake to see it finish.
I tried fast-export, and I don't really see anything wrong with the repository. The size is 221MB. It depends on how crazy you want to go. For example, SVN-era merges don't appear as merges, but looks like some SVN-era branches don't exist in Hg to begin with (Would I need to get cpython-fullhistory? Cloning it gives me a 400 Bad Request). Do we care about that? Or, changes that come from non-committers could have their Author field modified, maybe based on the ACKS file modification. It's feasible but will take time and manual work. Do we care about that? -- Nicolás
On Thu, Feb 11, 2016, 16:43 Nicolás Alvarez <nicolas.alvarez@gmail.com> wrote:
2016-02-06 3:03 GMT-03:00 Nicolás Alvarez <nicolas.alvarez@gmail.com>:
2016-02-05 23:39 GMT-03:00 Nicolás Alvarez <nicolas.alvarez@gmail.com>:
2016-02-05 22:57 GMT-03:00 Brett Cannon <brett@python.org>:
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria...
There appear to be multiple ways to convert hg repos to git, but no
clear
winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
I said I'd look into this. I didn't. Shame on me.
Trying fast-export now :)
Update: The fast-export tool started at about 500 revs/sec but progressively slowed down. Now it's 90% done after churning for two hours, and each merge commit (of which there are many!) takes an entire second by itself. I don't feel like staying awake to see it finish.
I tried fast-export, and I don't really see anything wrong with the repository. The size is 221MB.
It depends on how crazy you want to go. For example, SVN-era merges don't appear as merges, but looks like some SVN-era branches don't exist in Hg to begin with (Would I need to get cpython-fullhistory? Cloning it gives me a 400 Bad Request). Do we care about that?
Good question. If you are not an even clone it then that shows how much people who are. Honestly I wouldn't worry since we have the history in the hg repo (converting from svn was necessary to have it available without the server).
Or, changes that come from non-committers could have their Author field modified, maybe based on the ACKS file modification. It's feasible but will take time and manual work. Do we care about that?
That would be great but too much effort. Brett
-- Nicolás
On 02/11/2016 07:07 PM, Brett Cannon wrote:
On Thu, Feb 11, 2016, 16:43 Nicolás Alvarez wrote:
It depends on how crazy you want to go. For example, SVN-era merges don't appear as merges, but looks like some SVN-era branches don't exist in Hg to begin with (Would I need to get cpython-fullhistory? Cloning it gives me a 400 Bad Request). Do we care about that?
If you are not an even clone it then that shows how much people who are.
Um, could you repeat that? In English? :) -- ~Ethan~
On Thu, Feb 11, 2016, 19:27 Ethan Furman <ethan@stoneleaf.us> wrote:
On 02/11/2016 07:07 PM, Brett Cannon wrote:
On Thu, Feb 11, 2016, 16:43 Nicolás Alvarez wrote:
It depends on how crazy you want to go. For example, SVN-era merges don't appear as merges, but looks like some SVN-era branches don't exist in Hg to begin with (Would I need to get cpython-fullhistory? Cloning it gives me a 400 Bad Request). Do we care about that?
If you are not an even clone it then that shows how much people who are.
Um, could you repeat that? In English? :)
If you aren't able to even clone it then that shows how much people care. Brett
-- ~Ethan~ _______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
On 12 February 2016 at 03:07, Brett Cannon <brett@python.org> wrote:
On Thu, Feb 11, 2016, 16:43 Nicolás Alvarez <nicolas.alvarez@gmail.com> wrote:
I tried fast-export, and I don't really see anything wrong with the repository. The size is 221MB.
One thing I’m slightly curious about is how much the result differs from <https://github.com/python/cpython> or other results, and if so, what the differences are. The differences could be serious (mangled history), or they could be trivial things like stripping trailing newlines from commit messages, or skipping commits that don’t change any files.
It depends on how crazy you want to go. For example, SVN-era merges don't appear as merges, but looks like some SVN-era branches don't exist in Hg to begin with (Would I need to get cpython-fullhistory? Cloning it gives me a 400 Bad Request). Do we care about that?
Good question. If you are not an even clone it then that shows how much people who are. Honestly I wouldn't worry since we have the history in the hg repo (converting from svn was necessary to have it available without the server).
I care a bit. If I get the time, I would like to figure out a robust way to convert the Subversion history to Git so that the svnmerge information is included as proper merges. Another concern for me is that some of the useful history is not even in Mercurial. For example <https://hg.python.org/lookup/r70152> is an svnmerge from ^/python/branches/io-c into ^/python/branches/py3k, but the Mercurial repository doesn’t have the branch history, so all the merged-in Subversion revisions such as r68683 are missing. Some other highlights on my quest to investigate the holy Subversion respository (I can post my full notes somewhere if ppl are interested): * It is nice to have a local mirror of the Subversion repository so that experimenting with different options and programs isn’t horribly slow. But I don’t want to mirror everything or overload the server because there are other projects stored in the repository that seem to take up a lot of space (and download time). * What is the story with the cpython-fullhistory Mercurial repository? On the surface it almost looks like an out-of-date copy of the main repository, but I notice some subtle differences, e.g. revision ids for early tags are different, v1.0.0 tag is added. * Some Subversion revisions actually merge stuff from outside the Python tree (e.g. <https://hg.python.org/lookup/r88662> from ^/sandbox/trunk/2to3/lib2to3 into ^/branches/release27-maint/Lib/lib2to3. Not sure if it is worth trying to salvage these merges; I never noticed them when working on Python.
Or, changes that come from non-committers could have their Author field modified, maybe based on the ACKS file modification. It's feasible but will take time and manual work. Do we care about that?
That would be great but too much effort.
I think it would not be worth it, and could even be detrimenal. You would be trying to guess based on incomplete and unreliable information. Maybe one person wrote a test, another wrote the implementation, and a third wrote the documentation, but it was all committed at once. Maybe the author was already in ACKS and the committer did not mention who the author was in the message. I think it is safer to not pretend the author field is alway accurate.
2016-02-12 3:04 GMT-03:00 Martin Panter <vadmium+py@gmail.com>:
On 12 February 2016 at 03:07, Brett Cannon <brett@python.org> wrote:
On Thu, Feb 11, 2016, 16:43 Nicolás Alvarez <nicolas.alvarez@gmail.com> wrote:
It depends on how crazy you want to go. For example, SVN-era merges don't appear as merges, but looks like some SVN-era branches don't exist in Hg to begin with (Would I need to get cpython-fullhistory? Cloning it gives me a 400 Bad Request). Do we care about that?
Good question. If you are not an even clone it then that shows how much people who are. Honestly I wouldn't worry since we have the history in the hg repo (converting from svn was necessary to have it available without the server).
I care a bit. If I get the time, I would like to figure out a robust way to convert the Subversion history to Git so that the svnmerge information is included as proper merges.
I migrated most of KDE from SVN to Git, progressively converting a single million-revision repository into hundreds of per-app Git repositories. Is it still possible to access the SVN repository? :)
Some other highlights on my quest to investigate the holy Subversion respository (I can post my full notes somewhere if ppl are interested):
* It is nice to have a local mirror of the Subversion repository so that experimenting with different options and programs isn’t horribly slow. But I don’t want to mirror everything or overload the server because there are other projects stored in the repository that seem to take up a lot of space (and download time).
The svn2git tool we used in KDE *requires* the repository to be local, because the libsvn API it uses works with repositories, not server URLs. -- Nicolás
On 12 February 2016 at 06:27, Nicolás Alvarez <nicolas.alvarez@gmail.com> wrote:
2016-02-12 3:04 GMT-03:00 Martin Panter <vadmium+py@gmail.com>:
On 12 February 2016 at 03:07, Brett Cannon <brett@python.org> wrote:
On Thu, Feb 11, 2016, 16:43 Nicolás Alvarez <nicolas.alvarez@gmail.com> wrote:
It depends on how crazy you want to go. For example, SVN-era merges don't appear as merges, but looks like some SVN-era branches don't exist in Hg to begin with (Would I need to get cpython-fullhistory? Cloning it gives me a 400 Bad Request). Do we care about that?
Good question. If you are not an even clone it then that shows how much people who are. Honestly I wouldn't worry since we have the history in the hg repo (converting from svn was necessary to have it available without the server).
I care a bit. If I get the time, I would like to figure out a robust way to convert the Subversion history to Git so that the svnmerge information is included as proper merges.
I migrated most of KDE from SVN to Git, progressively converting a single million-revision repository into hundreds of per-app Git repositories.
Is it still possible to access the SVN repository? :)
svn info https://svn.python.org/projects/python
Some other highlights on my quest to investigate the holy Subversion respository (I can post my full notes somewhere if ppl are interested):
* It is nice to have a local mirror of the Subversion repository so that experimenting with different options and programs isn’t horribly slow. But I don’t want to mirror everything or overload the server because there are other projects stored in the repository that seem to take up a lot of space (and download time).
The svn2git tool we used in KDE *requires* the repository to be local, because the libsvn API it uses works with repositories, not server URLs.
Hi! On Fri, Feb 12, 2016 at 06:04:19AM +0000, Martin Panter <vadmium+py@gmail.com> wrote:
One thing I???m slightly curious about is how much the result differs from <https://github.com/python/cpython> or other results, and if so, what the differences are. The differences could be serious (mangled history), or they could be trivial things like stripping trailing newlines from commit messages, or skipping commits that don???t change any files.
git-remote-hg that I have been using can be configured to produce commits compatible with hg-git. If the repo above was created with hg-git I can rerun my conversion with hg-git compatibility turned on. Then it may be compared with the mentioned repository. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
I don't remember the story behind cpython-fullhistory, but it's obviously incomplete since just stopped post conversion. You will need to find someone who knows (I'd ask on python-dev). Also realize that this will be our fourth VCS (cvs, svn, hg, now git). This is not going to be a perfect history of the semantic actions of commits from the beginning of time just due to the fact that these VCS tools all use different concepts. On Thu, Feb 11, 2016, 22:04 Martin Panter <vadmium+py@gmail.com> wrote:
On 12 February 2016 at 03:07, Brett Cannon <brett@python.org> wrote:
On Thu, Feb 11, 2016, 16:43 Nicolás Alvarez <nicolas.alvarez@gmail.com> wrote:
I tried fast-export, and I don't really see anything wrong with the repository. The size is 221MB.
One thing I’m slightly curious about is how much the result differs from <https://github.com/python/cpython> or other results, and if so, what the differences are. The differences could be serious (mangled history), or they could be trivial things like stripping trailing newlines from commit messages, or skipping commits that don’t change any files.
It depends on how crazy you want to go. For example, SVN-era merges don't appear as merges, but looks like some SVN-era branches don't exist in Hg to begin with (Would I need to get cpython-fullhistory? Cloning it gives me a 400 Bad Request). Do we care about that?
Good question. If you are not an even clone it then that shows how much people who are. Honestly I wouldn't worry since we have the history in the hg repo (converting from svn was necessary to have it available without the server).
I care a bit. If I get the time, I would like to figure out a robust way to convert the Subversion history to Git so that the svnmerge information is included as proper merges.
Another concern for me is that some of the useful history is not even in Mercurial. For example <https://hg.python.org/lookup/r70152> is an svnmerge from ^/python/branches/io-c into ^/python/branches/py3k, but the Mercurial repository doesn’t have the branch history, so all the merged-in Subversion revisions such as r68683 are missing.
Some other highlights on my quest to investigate the holy Subversion respository (I can post my full notes somewhere if ppl are interested):
* It is nice to have a local mirror of the Subversion repository so that experimenting with different options and programs isn’t horribly slow. But I don’t want to mirror everything or overload the server because there are other projects stored in the repository that seem to take up a lot of space (and download time).
* What is the story with the cpython-fullhistory Mercurial repository? On the surface it almost looks like an out-of-date copy of the main repository, but I notice some subtle differences, e.g. revision ids for early tags are different, v1.0.0 tag is added.
* Some Subversion revisions actually merge stuff from outside the Python tree (e.g. <https://hg.python.org/lookup/r88662> from ^/sandbox/trunk/2to3/lib2to3 into ^/branches/release27-maint/Lib/lib2to3. Not sure if it is worth trying to salvage these merges; I never noticed them when working on Python.
Or, changes that come from non-committers could have their Author field modified, maybe based on the ACKS file modification. It's feasible but will take time and manual work. Do we care about that?
That would be great but too much effort.
I think it would not be worth it, and could even be detrimenal. You would be trying to guess based on incomplete and unreliable information. Maybe one person wrote a test, another wrote the implementation, and a third wrote the documentation, but it was all committed at once. Maybe the author was already in ACKS and the committer did not mention who the author was in the message. I think it is safer to not pretend the author field is alway accurate.
On 13 February 2016 at 11:15, Brett Cannon <brett@python.org> wrote:
I don't remember the story behind cpython-fullhistory, but it's obviously incomplete since just stopped post conversion. You will need to find someone who knows (I'd ask on python-dev).
Also realize that this will be our fourth VCS (cvs, svn, hg, now git). This is not going to be a perfect history of the semantic actions of commits from the beginning of time just due to the fact that these VCS tools all use different concepts.
There's also the fact that prior to the move to SourceForge in 2000, all changes had to be funneled through the half dozen or so people with write access to the CVS tree: https://docs.python.org/3/whatsnew/2.0.html#new-development-process I think it's definitely OK if future code archaeologists need to dig into the SVN repository to get a more complete view of CPython's history. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 02/13/2016 02:56 AM, Nick Coghlan wrote:
On 13 February 2016 at 11:15, Brett Cannon <brett@python.org> wrote:
I don't remember the story behind cpython-fullhistory, but it's obviously incomplete since just stopped post conversion. You will need to find someone who knows (I'd ask on python-dev).
Also realize that this will be our fourth VCS (cvs, svn, hg, now git). This is not going to be a perfect history of the semantic actions of commits from the beginning of time just due to the fact that these VCS tools all use different concepts.
There's also the fact that prior to the move to SourceForge in 2000, all changes had to be funneled through the half dozen or so people with write access to the CVS tree: https://docs.python.org/3/whatsnew/2.0.html#new-development-process
I think it's definitely OK if future code archaeologists need to dig into the SVN repository to get a more complete view of CPython's history.
I've never met a project who did not regret such decision at some point. Keeping older history is usually valuable. Mercurial have powerful enough tool to let you get all the history back together, I assume git probably have that power too. This is your call, but I strongly recommend taking advantage of this migration to put everything back together. -- Pierre-Yves David
On 02/14/2016 06:24 PM, Pierre-Yves David wrote:
On 02/13/2016 02:56 AM, Nick Coghlan wrote:
On 13 February 2016 at 11:15, Brett Cannon <brett@python.org> wrote:
I don't remember the story behind cpython-fullhistory, but it's obviously incomplete since just stopped post conversion. You will need to find someone who knows (I'd ask on python-dev).
Also realize that this will be our fourth VCS (cvs, svn, hg, now git). This is not going to be a perfect history of the semantic actions of commits from the beginning of time just due to the fact that these VCS tools all use different concepts.
There's also the fact that prior to the move to SourceForge in 2000, all changes had to be funneled through the half dozen or so people with write access to the CVS tree: https://docs.python.org/3/whatsnew/2.0.html#new-development-process
I think it's definitely OK if future code archaeologists need to dig into the SVN repository to get a more complete view of CPython's history.
I've never met a project who did not regret such decision at some point. Keeping older history is usually valuable. Mercurial have powerful enough tool to let you get all the history back together, I assume git probably have that power too.
This is your call, but I strongly recommend taking advantage of this migration to put everything back together.
While "putting everything back together" would be great, it doesn't *have* to block the migration. Git has a command called "git replace" that lets you do this later. The Linux kernel (which switched to Git before Git migration tools existed) has a separate "early history" repo that you can "prepend" to the main one. Then, in your local copy, it looks like one unbroken history. Since Git commits are snapshots and not deltas, this works amazingly well -- it's just telling Git's object retrieval routine to retrieve <object X> instead of <object Y>. The disadvantage is that it has to be done in each clone individually -- no one can rewrite history for others. Two commands every future historian would have to do: git fetch <url_for_old_history> git replace --graft <first_commit_of_new_history> <last_commit_of_old_history>
Response from GitHub staff regarding using their Importer <https://import.github.com/> to import CPython: Unfortunately, the repository is too large to migrate using the importer. I’d recommend converting it to git locally using something like hg-fast-export. Due to its size, you’ll need to push the local repo to GitHub in chunks. So it seems like Importer is a no-go. Nick
On Mon, Feb 15, 2016 at 1:39 PM, Nicholas Chammas < nicholas.chammas@gmail.com> wrote:
So it seems like Importer is a no-go.
Yes, I got the same response from github. However, the tools like hg-git are able to accomplish the task and we will be able to carry forward with those. -- Senthil
On 02/15/2016 10:12 AM, Petr Viktorin wrote:
On 02/14/2016 06:24 PM, Pierre-Yves David wrote:
On 02/13/2016 02:56 AM, Nick Coghlan wrote:
On 13 February 2016 at 11:15, Brett Cannon <brett@python.org> wrote:
I don't remember the story behind cpython-fullhistory, but it's obviously incomplete since just stopped post conversion. You will need to find someone who knows (I'd ask on python-dev).
Also realize that this will be our fourth VCS (cvs, svn, hg, now git). This is not going to be a perfect history of the semantic actions of commits from the beginning of time just due to the fact that these VCS tools all use different concepts.
There's also the fact that prior to the move to SourceForge in 2000, all changes had to be funneled through the half dozen or so people with write access to the CVS tree: https://docs.python.org/3/whatsnew/2.0.html#new-development-process
I think it's definitely OK if future code archaeologists need to dig into the SVN repository to get a more complete view of CPython's history.
I've never met a project who did not regret such decision at some point. Keeping older history is usually valuable. Mercurial have powerful enough tool to let you get all the history back together, I assume git probably have that power too.
This is your call, but I strongly recommend taking advantage of this migration to put everything back together.
While "putting everything back together" would be great, it doesn't *have* to block the migration. Git has a command called "git replace" that lets you do this later.
The Linux kernel (which switched to Git before Git migration tools existed) has a separate "early history" repo that you can "prepend" to the main one. Then, in your local copy, it looks like one unbroken history. Since Git commits are snapshots and not deltas, this works amazingly well -- it's just telling Git's object retrieval routine to retrieve <object X> instead of <object Y>. The disadvantage is that it has to be done in each clone individually -- no one can rewrite history for others.
Two commands every future historian would have to do: git fetch <url_for_old_history> git replace --graft <first_commit_of_new_history> <last_commit_of_old_history>
While this exists, I believe this would be much more convenient to have the history right in the first place. But that's not my call. -- Pierre-Yves David
Hello! On Sat, Feb 06, 2016 at 01:57:15AM +0000, Brett Cannon <brett@python.org> wrote:
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria...
There appear to be multiple ways to convert hg repos to git, but no clear winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
Let's me try git-remote-hg transport. $ hg clone https://hg.python.org/cpython/ cpython-hg $ time git clone hg::cpython-hg cpython-git real 39m44.600s user 45m54.192s sys 1m4.184s $ cd cpython-git/ * master 6bd585f merge from 3.5 remotes/origin/HEAD -> origin/master remotes/origin/branches/2.7 9842886 Fix userinfo example presented in urllib2 howto. remotes/origin/branches/3.2 51e4a9f Issue #25940: On Windows, connecting to port 444 returns ETIMEDOUT remotes/origin/branches/3.3 a017765 Issue #25940: Merge ETIMEDOUT fix from 3.2 into 3.3 remotes/origin/branches/3.4 1b9c53a reject negative data_size remotes/origin/branches/3.5 411a8a5 Fix userinfo example presented in urllib2 howto. remotes/origin/branches/default 6bd585f merge from 3.5 remotes/origin/master 6bd585f merge from 3.5 $ git log --decorate --graph -5 * commit 6bd585f (HEAD, origin/master, origin/branches/default, origin/HEAD, refs/hg/origin/branches/default, refs/hg/origin/bookmarks/master, master) |\ Merge: 10cbbbf 411a8a5 | | Author: Senthil Kumaran <senthil@uthcode.com> | | Date: Fri Feb 5 19:37:47 2016 -0800 | | | | merge from 3.5 | | | * commit 411a8a5 (origin/branches/3.5, refs/hg/origin/branches/3.5) | | Author: Senthil Kumaran <senthil@uthcode.com> | | Date: Fri Feb 5 19:37:23 2016 -0800 | | | | Fix userinfo example presented in urllib2 howto. | | * | commit 10cbbbf | | Author: Yury Selivanov <yselivanov@sprymix.com> | | Date: Fri Feb 5 19:40:01 2016 -0500 | | | | Issue #26288: Optimize PyLong_AsDouble. | | * | commit e5185e7 | | Author: Eric V. Smith <eric@trueblade.com> | | Date: Fri Feb 5 18:26:20 2016 -0500 | | | | Switch to more idiomatic C code. | | * | commit b59cb8f | | Author: Eric V. Smith <eric@trueblade.com> | | Date: Fri Feb 5 18:23:08 2016 -0500 PS. git-remote-hg provides bidirectional transport. You can continue pulling from Mercurial repository(ies) and you can commit and push back to Mercurial repository(ies). Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
On Sat, Feb 06, 2016 at 04:50:51PM +0100, Oleg Broytman <phd@phdru.name> wrote:
Hello!
On Sat, Feb 06, 2016 at 01:57:15AM +0000, Brett Cannon <brett@python.org> wrote:
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria...
There appear to be multiple ways to convert hg repos to git, but no clear winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
Let's me try git-remote-hg transport.
$ hg clone https://hg.python.org/cpython/ cpython-hg
$ time git clone hg::cpython-hg cpython-git real 39m44.600s user 45m54.192s sys 1m4.184s
$ cd cpython-git/
$ git remote add gh git@github.com:phdru/cpython.git $ git push --all gh See the result at https://github.com/phdru/cpython PS. I am `phdru' at Github. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
On 02/06/2016 05:48 PM, Oleg Broytman wrote:
On Sat, Feb 06, 2016 at 04:50:51PM +0100, Oleg Broytman <phd@phdru.name> wrote:
Hello!
On Sat, Feb 06, 2016 at 01:57:15AM +0000, Brett Cannon <brett@python.org> wrote:
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria...
There appear to be multiple ways to convert hg repos to git, but no clear winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
Let's me try git-remote-hg transport.
$ hg clone https://hg.python.org/cpython/ cpython-hg
$ time git clone hg::cpython-hg cpython-git real 39m44.600s user 45m54.192s sys 1m4.184s
$ cd cpython-git/
$ git remote add gh git@github.com:phdru/cpython.git
$ git push --all gh
See the result at https://github.com/phdru/cpython
You might also want to try something like $ git repack -a -d -f --depth=250 --window=250 after importing to decrease the size of the repository for everyone pulling from you. (Reference: https://gcc.gnu.org/ml/gcc/2007-12/msg00165.html)
On Sat, Feb 06, 2016 at 05:57:47PM +0100, Petr Viktorin <encukou@gmail.com> wrote:
On 02/06/2016 05:48 PM, Oleg Broytman wrote:
$ git remote add gh git@github.com:phdru/cpython.git
$ git push --all gh
See the result at https://github.com/phdru/cpython
You might also want to try something like
$ git repack -a -d -f --depth=250 --window=250
after importing to decrease the size of the repository for everyone pulling from you.
(Reference: https://gcc.gnu.org/ml/gcc/2007-12/msg00165.html)
See the discussion of this old and outdated message at PEP 103: https://www.python.org/dev/peps/pep-0103/#database-maintenance The recommended parameters for git repack are --depth=20 --window=250 See http://vcscompare.blogspot.ru/2008/06/git-repack-parameters.html But yes, you're right, git gc/repack is very much recommended. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
FIrst off, thanks to everyone who has stepped out and started running various approaches to test them out! Any and all help is appreciated since there are a lot of parts to this transition and I definitely don't want to do it on my own (especially since Python 2.7 would have its last release by the time I would finish with all the work). Second, does anyone -- or group of people -- want to own this and figure out what to try out, keep track of what has been tried, come up with some way to evaluate the results (both for accuracy in the conversion but also if there is some way to say one is better than another), and come back to the list with a solution? All I ask is you try to do it in the open (whether it's by a Google Doc that's open to the public for comment or a GitHub repo, I don't care) so people who want to help can? It seems there are people definitely willing to try out the tools and report back, but I'm looking for someone/people to organize the effort and come back to me with a thought-out solution so I don't have to. :) On Sat, 6 Feb 2016 at 09:16 Oleg Broytman <phd@phdru.name> wrote:
On Sat, Feb 06, 2016 at 05:57:47PM +0100, Petr Viktorin <encukou@gmail.com> wrote:
On 02/06/2016 05:48 PM, Oleg Broytman wrote:
$ git remote add gh git@github.com:phdru/cpython.git
$ git push --all gh
See the result at https://github.com/phdru/cpython
You might also want to try something like
$ git repack -a -d -f --depth=250 --window=250
after importing to decrease the size of the repository for everyone pulling from you.
(Reference: https://gcc.gnu.org/ml/gcc/2007-12/msg00165.html)
See the discussion of this old and outdated message at PEP 103: https://www.python.org/dev/peps/pep-0103/#database-maintenance
The recommended parameters for git repack are --depth=20 --window=250 See http://vcscompare.blogspot.ru/2008/06/git-repack-parameters.html
But yes, you're right, git gc/repack is very much recommended.
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN. _______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
On 7 February 2016 at 05:42, Brett Cannon <brett@python.org> wrote:
FIrst off, thanks to everyone who has stepped out and started running various approaches to test them out! Any and all help is appreciated since there are a lot of parts to this transition and I definitely don't want to do it on my own (especially since Python 2.7 would have its last release by the time I would finish with all the work).
Second, does anyone -- or group of people -- want to own this and figure out what to try out, keep track of what has been tried, come up with some way to evaluate the results (both for accuracy in the conversion but also if there is some way to say one is better than another), and come back to the list with a solution? All I ask is you try to do it in the open (whether it's by a Google Doc that's open to the public for comment or a GitHub repo, I don't care) so people who want to help can? It seems there are people definitely willing to try out the tools and report back, but I'm looking for someone/people to organize the effort and come back to me with a thought-out solution so I don't have to. :)
For beaker-project.org, I found it really useful to have an "administrivia" repo in our Gerrit instance for all the random scripts that didn't have a proper home, but we also didn't want the sole copy living on someone's hard drive. Creating a similar repo (e.g. "core-workflow") under https://github.com/python would likely be useful here, since it would not only give people a place to collaborate on utility scripts, but also an ad hoc issue tracker for tasks that don't have a more appropriate home. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Hi, On 02/07/2016 08:29 AM, Nick Coghlan wrote:
Creating a similar repo (e.g. "core-workflow") under https://github.com/python would likely be useful here, since it would not only give people a place to collaborate on utility scripts, but also an ad hoc issue tracker for tasks that don't have a more appropriate home.
+1, plus IMHO it would help to document the workflow. Regards, francis
On Mon, 8 Feb 2016 at 09:34 francismb <francismb@email.de> wrote:
Hi,
On 02/07/2016 08:29 AM, Nick Coghlan wrote:
Creating a similar repo (e.g. "core-workflow") under https://github.com/python would likely be useful here, since it would not only give people a place to collaborate on utility scripts, but also an ad hoc issue tracker for tasks that don't have a more appropriate home.
+1, plus IMHO it would help to document the workflow.
The workflow will be documented in the devguide, so that's not a worry. As for collecting things in a repo like that, it's a possibility, but it does require me managing it since I will have to manually add people to have commit access as we don't even have the core devs on GitHub yet.
Hi Brett, On Sat, Feb 6, 2016 at 11:42 AM, Brett Cannon <brett@python.org> wrote:
It seems there are people definitely willing to try out the tools and report back, but I'm looking for someone/people to organize the effort and come back to me with a thought-out solution so I don't have to. :)
I wanted to get involved with this process. I am ready to help you here. https://github.com/orsenthil/cpython-hg-to-git Here I started documenting our discussion so far. I will continue to evaluate the tools (with other developers this list) and we'll try to come up with a conclusive suggestion for our migration. Thank you, Senthil
On Mon, 8 Feb 2016 at 09:54 Senthil Kumaran <senthil@uthcode.com> wrote:
Hi Brett,
On Sat, Feb 6, 2016 at 11:42 AM, Brett Cannon <brett@python.org> wrote:
It seems there are people definitely willing to try out the tools and report back, but I'm looking for someone/people to organize the effort and come back to me with a thought-out solution so I don't have to. :)
I wanted to get involved with this process. I am ready to help you here.
https://github.com/orsenthil/cpython-hg-to-git
Here I started documenting our discussion so far. I will continue to evaluate the tools (with other developers this list) and we'll try to come up with a conclusive suggestion for our migration.
Thanks to much, Senthil! When I get a chance I will update the PEP to say this step of the migration is in process and you as in charge of it. -Brett P.S.: For this whole migration I'm going to try and delegate where appropriate, so this won't be the last time I reach out for help. I anticipate helping with bugs.python.org is going to be the biggest stumbling block/need. And for those that wanted GitLab over GitHub, a vast majority of this migration is Git host-agnostic, and so this is necessary for after we leave GitHub (either because we're unhappy with GitHub or because it's simply time to move; nothing is forever).
Hey, I just saw the GitHub released a tool to import Mercurial repo: https://github.com/blog/2110-migrate-your-code-with-the-github-importer Thought it might be usefull for this thread. I'm currently trying to import to see how it looks, have been stuck at 0% for a few minutes now. -- M Will be at there if import succeed. https://github.com/Carreau/cpython-test On Mon, Feb 8, 2016 at 10:23 AM, Brett Cannon <brett@python.org> wrote:
On Mon, 8 Feb 2016 at 09:54 Senthil Kumaran <senthil@uthcode.com> wrote:
Hi Brett,
On Sat, Feb 6, 2016 at 11:42 AM, Brett Cannon <brett@python.org> wrote:
It seems there are people definitely willing to try out the tools and report back, but I'm looking for someone/people to organize the effort and come back to me with a thought-out solution so I don't have to. :)
I wanted to get involved with this process. I am ready to help you here.
https://github.com/orsenthil/cpython-hg-to-git
Here I started documenting our discussion so far. I will continue to evaluate the tools (with other developers this list) and we'll try to come up with a conclusive suggestion for our migration.
Thanks to much, Senthil! When I get a chance I will update the PEP to say this step of the migration is in process and you as in charge of it.
-Brett
P.S.: For this whole migration I'm going to try and delegate where appropriate, so this won't be the last time I reach out for help. I anticipate helping with bugs.python.org is going to be the biggest stumbling block/need. And for those that wanted GitLab over GitHub, a vast majority of this migration is Git host-agnostic, and so this is necessary for after we leave GitHub (either because we're unhappy with GitHub or because it's simply time to move; nothing is forever).
_______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
On Thu, Feb 11, 2016 at 4:04 PM, Matthias Bussonnier < bussonniermatthias@gmail.com> wrote:
I just saw the GitHub released a tool to import Mercurial repo:
https://github.com/blog/2110-migrate-your-code-with-the-github-importer
Thought it might be usefull for this thread.
I'm currently trying to import to see how it looks, have been stuck at 0% for a few minutes now.
Yeah, I noticed it too and I has said that it will send an email once the import it done. Let's see if this feature provides all the things are looking for. This is interesting one for our needs. Thanks, Senthil
On Thu, 11 Feb 2016 at 17:07 Senthil Kumaran <senthil@uthcode.com> wrote:
On Thu, Feb 11, 2016 at 4:04 PM, Matthias Bussonnier < bussonniermatthias@gmail.com> wrote:
I just saw the GitHub released a tool to import Mercurial repo:
https://github.com/blog/2110-migrate-your-code-with-the-github-importer
Thought it might be usefull for this thread.
I'm currently trying to import to see how it looks, have been stuck at 0% for a few minutes now.
Yeah, I noticed it too and I has said that it will send an email once the import it done. Let's see if this feature provides all the things are looking for.
This is interesting one for our needs.
Two things. One, I just updated listing Senthil as in charge of evaluating the tools. Should go live in an hour or so (aside: I'm going to delete my GitHub repo for the PEP so I don't have to keep track of two copies; can't wait until we get this done enough to move the peps repo over). Two, thanks for everyone who jumped on the GitHub blog post and posting here! I noticed myself and was coming to comment when I noticed Matthias, Nick, Nicolás, and Senthil had beaten me to it. :) As Senthil said, it's very interesting to see this option open up to us and will need to see how they do. Hopefully it will work out (or at least we can help GitHub fix their tooling if they can't handle the cpython repo's deep history).
On Thu, Feb 11, 2016 at 4:04 PM, Matthias Bussonnier < bussonniermatthias@gmail.com> wrote:
I just saw the GitHub released a tool to import Mercurial repo:
https://github.com/blog/2110-migrate-your-code-with-the-github-importer
Turns out that our CPython repository size is too big for this importer tool to handle. It fails for me consistently at 78% and I tried more than 10 times to push the limit to 78%. As Brett mentioned, I will report this to github.com team and let's continue our evaluation of other tools which can successfully complete the hg-> git translation of the repo. -- Senthil
I'll try hg-git! On February 5, 2016 7:57:15 PM CST, Brett Cannon <brett@python.org> wrote:
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria...
There appear to be multiple ways to convert hg repos to git, but no clear winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
------------------------------------------------------------------------
_______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.
Well, hg-git might not work... I cloned the CPython repo. I then ran `hg gexport`, which converts the repository to Git. Took three hours, but it worked! However, actually attempted to push the result to a Git repo failed miserably. After five hours, Mercurial ran out of memory in the "adding objects" stage. And I have 6 GB of RAM! Right now, I'm trying to see if I can work around it. This may not work out, though. On Sat, Feb 6, 2016 at 10:37 AM, Ryan Gonzalez <rymg19@gmail.com> wrote:
I'll try hg-git!
On February 5, 2016 7:57:15 PM CST, Brett Cannon <brett@python.org> wrote:
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria...
There appear to be multiple ways to convert hg repos to git, but no clear winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
------------------------------
core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/
Hi, ESR wrote a repository migration tool which probably could be used for this, Reposurgeon: http://www.catb.org/esr/reposurgeon/ He used it (among other things) to migrate the huge emacs repository to git: http://esr.ibiblio.org/?p=5634 PS. Reposurgon is written in Python ;-) Greetings,
On Mon, 8 Feb 2016 at 01:19 Michiel Overtoom <motoom@xs4all.nl> wrote:
Hi,
ESR wrote a repository migration tool which probably could be used for this, Reposurgeon:
http://www.catb.org/esr/reposurgeon/
He used it (among other things) to migrate the huge emacs repository to git: http://esr.ibiblio.org/?p=5634
PS. Reposurgon is written in Python ;-)
If you (or anyone else) wants to try converting the cpython repo using the tool that would be appreciated!
On 02/06/2016 01:57 AM, Brett Cannon wrote:
https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a-mercuria...
There appear to be multiple ways to convert hg repos to git, but no clear winner. It would be great if some one/people took on the task of evaluating the tools available out there by converting the cpython repo and seeing which one has the best results.
We, at Facebook, have been happily using hg-git in production for bidirectional conversion of repositories of insane sizes. I would look in that direction for your conversion. -- Pierre-Yves David
Hi Pierre, Do you have external documents that you could point us to so that it help us with the research? Thanks for sharing this detail on hg<->git already working for a huge repo. \-- Senthil
On Feb 12 2016, at 6:28 am, Pierre-Yves David <pierre-yves.david@ens- lyon.org> wrote:
On 02/06/2016 01:57 AM, Brett Cannon wrote: > https://www.python.org/dev/peps/pep-0512/#define-commands-to-move-a -mercurial-repository-to-git > > There appear to be multiple ways to convert hg repos to git, but no > clear winner. It would be great if some one/people took on the task of > evaluating the tools available out there by converting the cpython repo > and seeing which one has the best results.
We, at Facebook, have been happily using hg-git in production for bidirectional conversion of repositories of insane sizes.
I would look in that direction for your conversion.
\-- Pierre-Yves David
core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
On 02/12/2016 02:37 PM, Senthil Kumaran wrote:
Hi Pierre,
Do you have external documents that you could point us to so that it help us with the research? Thanks for sharing this detail on hg<->git already working for a huge repo.
I would expect the hg-git documentation to be the place to start. -- Pierre-Yves David
How did you manage to push the repo somewhere? hg-git successfully exported all the changesets for me, but I couldn't push to a directory, and my computer ran out of memory twice during the adding objects stage when trying to push to GitHub. On February 12, 2016 8:41:08 AM CST, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
On 02/12/2016 02:37 PM, Senthil Kumaran wrote:
Hi Pierre,
Do you have external documents that you could point us to so that it help us with the research? Thanks for sharing this detail on hg<->git already working for a huge repo.
I would expect the hg-git documentation to be the place to start.
-- Pierre-Yves David _______________________________________________ core-workflow mailing list core-workflow@python.org https://mail.python.org/mailman/listinfo/core-workflow This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.
On Fri, Feb 12, 2016 at 10:02:11AM -0600, Ryan Gonzalez <rymg19@gmail.com> wrote:
How did you manage to push the repo somewhere? hg-git successfully exported all the changesets for me, but I couldn't push to a directory, and my computer ran out of memory twice during the adding objects stage when trying to push to GitHub.
As for Github, push in smaller batches, something like $ git log --oneline | wc -l 92979 $ git push github master~45000:refs/heads/master $ git push github master:master Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
On Fri, Feb 12, 2016 at 06:13:01PM +0100, Oleg Broytman <phd@phdru.name> wrote:
On Fri, Feb 12, 2016 at 10:02:11AM -0600, Ryan Gonzalez <rymg19@gmail.com> wrote:
How did you manage to push the repo somewhere? hg-git successfully exported all the changesets for me, but I couldn't push to a directory, and my computer ran out of memory twice during the adding objects stage when trying to push to GitHub.
As for Github, push in smaller batches, something like
$ git log --oneline | wc -l 92979
$ git push github master~45000:refs/heads/master
$ git push github master:master
And after that $ git push --all github Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
On Fri, Feb 12, 2016 at 6:20 AM, Pierre-Yves David < pierre-yves.david@ens-lyon.org> wrote:
We, at Facebook, have been happily using hg-git in production for bidirectional conversion of repositories of insane sizes.
I played with hg-git tool and was able to migrate the hg cpython default branch to git successfully. I have documented the steps here: https://github.com/orsenthil/cpython-hg-to-git/blob/master/README.md import.github.com is unable to handle the size of our repo. I have reported this to github.com and let's see if we get any support from them.
2016-02-14 1:27 GMT-03:00 Senthil Kumaran <senthil@uthcode.com>:
On Fri, Feb 12, 2016 at 6:20 AM, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
We, at Facebook, have been happily using hg-git in production for bidirectional conversion of repositories of insane sizes.
I played with hg-git tool and was able to migrate the hg cpython default branch to git successfully. I have documented the steps here: https://github.com/orsenthil/cpython-hg-to-git/blob/master/README.md
Did you repack the git repository after importing with hg-git and before pushing it? Every repository import tool I know of produces extremely suboptimal pack files. fast-export initially gave me a 3GB monstrosity before I ran "git gc --aggressive". -- Nicolás
participants (14)
-
Brett Cannon
-
Ethan Furman
-
francismb
-
Martin Panter
-
Matthias Bussonnier
-
Michiel Overtoom
-
Nicholas Chammas
-
Nick Coghlan
-
Nicolás Alvarez
-
Oleg Broytman
-
Petr Viktorin
-
Pierre-Yves David
-
Ryan Gonzalez
-
Senthil Kumaran