[Catalog-sig] PyPI mirrors are all up to date

Tarek Ziadé tarek at ziade.org
Tue Apr 17 00:09:36 CEST 2012

On 4/16/12 11:57 PM, "Martin v. Löwis" wrote:
>> Maybe a better checksum would be a global hash calculated differently ?
> Define a protocol, and I present you with an implementation that
> conforms to the protocol, and still has inconsistent data, and not
> in a malicious manner, but due to bugs/race conditions/unexpected
> events. It's pointless.
if you calculate a checksum with all mirrored files - you can guarantee 
that the bits are the same
on both side, no ? then you have the md5 checksum per file for the 
download so the client can
be sure he got a non corrupted file.

> Ultimately, clients will need to verify the
> data that they receive (if they suspect issues), and fall back gracefully.
how can they know if version 1.3 of package foo never made it to the 
mirror they use ?

They can't. They have to trust the last modified date and make the 
assumption that the mirror
is fresh enough, for foo 1.3 to be present in both the master and the 

>> I can definitely see a mirroring implementation where the
>> last-modified field is updated at the end while some packages are not
>> copied over at the end for whatever network issue.
> That mirroring implementation would violate the principle that
> last-modified should only be updated when the mirroring run was
> completed successfully.
like a file that claims in its metadata it's 512 kb long and only has 2 
kb in reality because something went wrong ?

I think the idea of the checksum is to double-check that kind of claim.  
But maybe that's overkill ?
maybe the mirroring code should check file by file that everything was 
copied correctly ?


> Regards,
> Martin

More information about the Catalog-SIG mailing list