[Catalog-sig] PyPI mirrors are all up to date

Tue Apr 17 00:48:05 CEST 2012

On 4/17/12 12:19 AM, "Martin v. Löwis" wrote:
> Am 17.04.2012 00:09, schrieb Tarek Ziadé:
>> On 4/16/12 11:57 PM, "Martin v. Löwis" wrote:
>>>> Maybe a better checksum would be a global hash calculated differently ?
>>> Define a protocol, and I present you with an implementation that
>>> conforms to the protocol, and still has inconsistent data, and not
>>> in a malicious manner, but due to bugs/race conditions/unexpected
>>> events. It's pointless.
>> if you calculate a checksum with all mirrored files - you can guarantee
>> that the bits are the same
>> on both side, no ?
> How exactly would you calculate that checksum?
by calculating the grand hash of each file hash.
>   Would you really require
> concatenation of all files?
I did not say that. You are claiming it in a rhetorical question.

> That could take a few hours per change.
why that ? you don't calculate the checksum of a file your already have 
twice.

Even if you do, it's very fast to call md5.

try it:

$ find mirror | xargs md5

this takes a few seconds at most on the whole mirror

> It
> would also raise the question in what order the files ought to be
> concatenated.
Anything reproductible, a sorted list.  In bash I *suspect* the 
calculation of the grand hash of the mirror is a one-liner that takes 
less than a minute.

I am going to stop here anyways because I don't see the point of 
discussing implementation details at this stage, since we were
barely starting to talk about the idea of a checksum - and that seems to 
be going nowhere.

Cheers
Tarek