
On 12.10.2008 18:18 Uhr, Martin v. Löwis wrote:
Our z3c.pypimirror already performs an incremental update based on the information available from the index.html page of the simple index and the available md5 hashes. Works like a charm...
So how does it find out when a release gets made?
What do you mean by that?
If you only look at
http://pypi.python.org/simple/
then you have no way of find out out what changed. So "the information available from the index.html page of the simple index" is not actually suitable for building incremental mirroring. What you describe is not possible.
I just looked at the z3c.pypimirror source, and found that it isn't really incremental: Whenever it mirrors, it looks at *all* index.html pages, of each an every package (all 4900 of them, except when you restrict the mirror). It then only downloads any new files that may have been added/deleted, and it *is* incremental wrt. files. IIUC, it is *not* incremental wrt. the package index itself.
Please correct me if I'm wrong (and please correct z3c.pypimirror if I'm not :-)
Good suggestion. I think we can take the changelog into account easily. Having to check this with Daniel Kraft, the original author of the package.
Can you please set a specific useragent header, to find out what amount of traffic pypimirror produces? Currently, urllib accounts for 17% of the requests, excluding requests made through urllib by setuptools (which is a separate 18%). It's probably not all of them through pypimirror, but of the 64626 requests made through urllib yesterday, 41671 originated from zopyx.com.
Should not be a problem.
For real incremental mirroring, you should retrieve the changelog, and access only those package pages that have actually changed since the last time you ran the mirror (successfully).
See above.
Andreas