<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Times New Roman",serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=FR-CA link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'>(I apologize for top-posting, I still haven’t figured out how to fix my email client)<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'>There’s nearly 94k commits in the git repo, and I expect the hg repo has that same number. It’s a tad more than 10,000.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'>I’ll definitely take a look at that tool; my main weakness is that I don’t know hg commands or similar, but comparing separate commits is most definitely better.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'>@Ethan: I meant that I would write all the output to a file for comparison, but apparently that’s not a very good idea, so here I drop it instead.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'>I’ll look at the tool and see what I can do. I’ll try to document my findings if I can’t come up with a good solution, and probably even if I do.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'>Cheers,<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'>-Emanuel<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D;mso-fareast-language:EN-US'><o:p> </o:p></span></p><div style='border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt'><div><div style='border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span lang=FR style='font-size:11.0pt;font-family:"Calibri",sans-serif'>From:</span></b><span lang=FR style='font-size:11.0pt;font-family:"Calibri",sans-serif'> Senthil Kumaran [mailto:senthil@uthcode.com] <br><b>Sent:</b> Sunday, May 08, 2016 8:43 PM<br><b>To:</b> Émanuel Barry<br><b>Cc:</b> core-workflow<br><b>Subject:</b> Re: [core-workflow] Some questions<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><div><div><p class=MsoNormal>Hi Émanuel,<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal>On Sun, May 8, 2016 at 4:40 PM, Émanuel Barry <<a href="mailto:vgr255@live.ca" target="_blank">vgr255@live.ca</a>> wrote:<o:p></o:p></p><blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm'><p class=MsoNormal><span lang=EN-CA style='font-size:11.0pt;color:#1F497D'>Take each X commit (say, every 100<sup>th </sup>or 1000<sup>th</sup> commit, or even every commit if we decide to be insane^Wprecise), store hashes of all files at that revision with possibly the file tree, in a .py file as a list or dict, or json or anything you prefer. Then I upload it for you to look at and you can compare with the mercurial repo. Or we run the same script on the mercurial repo and compare the resulting files.</span><o:p></o:p></p></blockquote><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>If we store anything externally, that could start limiting us.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>I looked at the problem in this angle - final cpython git repo has ~10000 commits in master branch. That's not a large number to deal with. The orginal hg repo should have exact number of commits. We have to do a diff between each of these commits, including merge commits. and check if contents of those commits are same, if we encounter anything where git-repo differs in content or history from hg-repo, we alert and fail.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>Since this is a history checking operation and we could complete this in O(minutes) or ~1 hour to validate the repos. This will give us confidence on the migration, and will help us evaluate multiple hg -> git repos that have been migrated at different points in time.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>This feature will go in this tool: <a href="https://github.com/orsenthil/cpython-hg-to-git">https://github.com/orsenthil/cpython-hg-to-git</a> , which we will use to migrate, sync, and validate hg->git repos.<o:p></o:p></p></div><div><p class=MsoNormal>If interested, you could research for efficient way to do the above operation and submit a pull request against that tool.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>HTH,<o:p></o:p></p></div><div><p class=MsoNormal>Senthil<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div></div><p class=MsoNormal style='margin-bottom:12.0pt'><o:p> </o:p></p></div></div></div></div></body></html>