[core-workflow] Some questions
Thomas Nyberg
tomnyberg at gmail.com
Sat May 21 18:30:31 EDT 2016
Hello,
I'm attaching a script that is an initial attempt at doing this for the
git side of things. Everything is done in bash at the moment. It does
make use of gnu parallel (which is the parallel package in both Ubuntu
and debian). I don't think anything else it uses isn't a standard tool
in linux (except git).
Basically to run it do the following:
1) clone cpython.git into the current directory (i.e. try not to have
any "generated" files)
2) put scan.sh in the current directory and run it there
What it does is the following:
It checkouts out every 1000th commit (can be changed) going backwards on
the current branch and computes the md5sum for every file (except those
found in .git) and puts the md5sums in a file in a outdir/ directory
that it creates. The names of the files are $num-$commit where the $num
is the number of commits _backwards_ from the current commit (which
makes sense if you think about iterating backwards from the current commit).
Running this on my laptop took ~11 minutes. I uploaded the output
directory here in case you don't feel like running it:
http://thomasnyberg.com/outdir.tar.bz2
(Ignore the frontpage of my "website". I'm obviously not all that
concerned by it...)
In any case, this might be helpful for others in addition to myself. I
figured it was best to email the list before continuing (maybe isn't
really what's needed...). Possible things to add to this:
* doing something similar with comments
* doing the same thing on all branches
* maybe only compute the md5sum for changed files
* little thought has gone into efficiency...there may be obvious gains
hiding
Of course something similar would have to be run with the hg version and
then a comparison would need to be done.
Hopefully this is helpful...
Cheers,
Thomas
On 05/08/2016 07:28 PM, Senthil Kumaran wrote:
>
> On Sun, May 8, 2016 at 4:12 PM, Émanuel Barry <vgr255 at live.ca
> <mailto:vgr255 at live.ca>> wrote:
>
> I understand that there's
> already a semi-official mirror of the cpython repo on GitHub, and
> I've been
> wondering why it isn't enough for our needs.
>
>
> It is suitable for our needs. Our last discussion was about how do we
> ascertain that
> cpython git repo has the same history as the hg repo, so that after
> migrate we do not loose any information from the old system.
>
> This could be done using:
>
> * check the number of commits in both repos for each branch
> * checking the hash of the source files in two repos.
> * (And do we go about validating each piece of commit log graph too)?
>
> If you have any suggestions, since you are using the cpython git mirror,
> please feel free to share your thoughts.
>
> Welcome to the party!
>
> Thanks,
> Senthil
>
>
> _______________________________________________
> core-workflow mailing list
> core-workflow at python.org
> https://mail.python.org/mailman/listinfo/core-workflow
> This list is governed by the PSF Code of Conduct: https://www.python.org/psf/codeofconduct
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scan.sh
Type: application/x-shellscript
Size: 997 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/core-workflow/attachments/20160521/756bf8fd/attachment.bin>
More information about the core-workflow
mailing list