[Catalog-sig] PyPI mirrors are all up to date

Donald Stufft donald.stufft at gmail.com
Tue Apr 17 00:14:11 CEST 2012



On Monday, April 16, 2012 at 6:09 PM, Tarek Ziadé wrote:

> On 4/16/12 11:57 PM, "Martin v. Löwis" wrote:
> > > Maybe a better checksum would be a global hash calculated differently ?
> >  
> > Define a protocol, and I present you with an implementation that
> > conforms to the protocol, and still has inconsistent data, and not
> > in a malicious manner, but due to bugs/race conditions/unexpected
> > events. It's pointless.
> >  
>  
> if you calculate a checksum with all mirrored files - you can guarantee  
> that the bits are the same
> on both side, no ? then you have the md5 checksum per file for the  
> download so the client can
> be sure he got a non corrupted file.
>  
> > Ultimately, clients will need to verify the
> > data that they receive (if they suspect issues), and fall back gracefully.
> >  
>  
> how can they know if version 1.3 of package foo never made it to the  
> mirror they use ?
>  
> They can't. They have to trust the last modified date and make the  
> assumption that the mirror
> is fresh enough, for foo 1.3 to be present in both the master and the  
> mirror.
>  
> >  
> > > I can definitely see a mirroring implementation where the
> > > last-modified field is updated at the end while some packages are not
> > > copied over at the end for whatever network issue.
> > >  
> >  
> > That mirroring implementation would violate the principle that
> > last-modified should only be updated when the mirroring run was
> > completed successfully.
> >  
>  
> like a file that claims in its metadata it's 512 kb long and only has 2  
> kb in reality because something went wrong ?
>  
> I think the idea of the checksum is to double-check that kind of claim.  
> But maybe that's overkill ?
> maybe the mirroring code should check file by file that everything was  
> copied correctly ?
>  
>  

fwiw Crate verifies the md5 hashes that the simple api gives for each package. I think not
doing that should be considered wrong. (it's considered important for clients to check the
checksum of packages they download, but mirrors that are going to be redistributing their
files to clients this isn't important? Seems to be a disconnect between the thoughts).
>  
> Cheers
> Tarek
>  
> >  
> > Regards,
> > Martin
> >  
>  
>  
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org (mailto:Catalog-SIG at python.org)
> http://mail.python.org/mailman/listinfo/catalog-sig
>  
>  


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20120416/10a7982f/attachment.html>


More information about the Catalog-SIG mailing list