
Our z3c.pypimirror already performs an incremental update based on the information available from the index.html page of the simple index and the available md5 hashes. Works like a charm...
So how does it find out when a release gets made?
What do you mean by that?
If you only look at
http://pypi.python.org/simple/
then you have no way of find out out what changed. So "the information available from the index.html page of the simple index" is not actually suitable for building incremental mirroring. What you describe is not possible.
I just looked at the z3c.pypimirror source, and found that it isn't really incremental: Whenever it mirrors, it looks at *all* index.html pages, of each an every package (all 4900 of them, except when you restrict the mirror). It then only downloads any new files that may have been added/deleted, and it *is* incremental wrt. files. IIUC, it is *not* incremental wrt. the package index itself.
Please correct me if I'm wrong (and please correct z3c.pypimirror if I'm not :-)
Can you please set a specific useragent header, to find out what amount of traffic pypimirror produces? Currently, urllib accounts for 17% of the requests, excluding requests made through urllib by setuptools (which is a separate 18%). It's probably not all of them through pypimirror, but of the 64626 requests made through urllib yesterday, 41671 originated from zopyx.com.
For real incremental mirroring, you should retrieve the changelog, and access only those package pages that have actually changed since the last time you ran the mirror (successfully).
Regards, Martin