[Catalog-sig] PyPI mirrors are all up to date

martin at v.loewis.de martin at v.loewis.de
Tue Apr 17 15:45:38 CEST 2012

> So you were updating a directory but serving another directory ?
> But then updating the right last-modified page people were seeing ?

It probably would have updated an unpublished last-modified as well.

For a similar issue, in appengine, I had duplicate File objects in the
database, and always served the one that GQL happened to return first.
In that case, both last-modified and the checksum might update correctly,
but still, the wrong file might get served.

> I am not sure why we're having this discussion since it's  
> implementation details, but it's fun :)

I'm still trying to prove my claim that it's not feasible to increase
the trustworthiness of a mirror by computing some kind of checksum.
If the mirror has some systematic or random error, it may well be that
the checksum is as-expected, yet the mirror is inconsistent.

> If there's interest I can write a multiprocess-based script that  
> keeps a md5 database up-to-date

That's besides the point. The question is whether doing so would practically
help to improve the consistency, and I believe the answer is: no. It may help
to increase people's trust (which is a subjective manner), which may be
worthwhile itself, but may also backfire if they download inconsistent files
despite the mirror giving "proof" that it is consistent.


More information about the Catalog-SIG mailing list