[Catalog-sig] distribute D.C. sprint tasks

zopyxfilter at googlemail.com zopyxfilter at googlemail.com
Mon Oct 13 14:10:24 CEST 2008

On 12.10.2008 18:18 Uhr, Martin v. Löwis wrote:
>>>> Our z3c.pypimirror already performs an incremental update  based on
>>>> the information available from the index.html page of the simple
>>>> index and the available md5 hashes. Works like a charm...
>>> So how does it find out when a release gets made?
>> What do you mean by that?
> If you only look at
> http://pypi.python.org/simple/
> then you have no way of find out out what changed. So "the information
> available from the index.html page of the simple index" is not actually
> suitable for building incremental mirroring. What you describe is not
> possible.
> I just looked at the z3c.pypimirror source, and found that it isn't
> really incremental: Whenever it mirrors, it looks at *all* index.html
> pages, of each an every package (all 4900 of them, except when you
> restrict the mirror). It then only downloads any new files that may
> have been added/deleted, and it *is* incremental wrt. files. IIUC,
> it is *not* incremental wrt. the package index itself.
> Please correct me if I'm wrong (and please correct z3c.pypimirror
> if I'm not :-)

Good suggestion. I think we can take the changelog into account easily. 
Having to check this with
Daniel Kraft, the original author of the package.

> Can you please set a specific useragent header, to find out what
> amount of traffic pypimirror produces? Currently, urllib accounts for
> 17% of the requests, excluding requests made through urllib by
> setuptools (which is a separate 18%). It's probably not all of them
> through pypimirror, but of the 64626 requests made through urllib
> yesterday, 41671 originated from zopyx.com.

Should not be a problem.

> For real incremental mirroring, you should retrieve the changelog,
> and access only those package pages that have actually changed since
> the last time you ran the mirror (successfully).

See above.


More information about the Catalog-SIG mailing list