[Catalog-sig] Replacement client for pep381client

Christian Theune ct at gocept.com
Thu Mar 21 00:59:21 CET 2013


Hi,

as you might be aware, I've done my share on bitching about my mirror 
(f.pypi.python.org) breaking.

I have picked pep381client apart yesterday and rebuilt it - mostly from 
ground up.

You can find a working version here:
https://bitbucket.org/ctheune/bandersnatch

The focus has been on making it a lot more robust and a lot easier to 
repair a mirror when it's known to be broken. To achieve that I:

- refactored the code, trying to make it more intentional, less mechanical
- stop parsing the simple pages' html and make more use of the XML-RPC API
- add Tarek's worker/queue approach for parallelizing it
- keep as little state as possible on the client
- switch form timestamps to serial counters for checking what and how 
much to update
- handle locking of concurrent runs more gracefully

I think I have a good grasp of what's going on now so that I can keep 
maintining this in the future.

I'm currently re-initializing my own mirror. This basically can be run 
in-place by just removing the existing state data and calling my sync 
script (bsn-mirror) instead of pep381run with the same parameters.

Tomorrow I'll update the documentation, make it use a config file and 
put some lipstick on the main entry point. After that I should be ready 
for a release.

If you want to give it a try already, you just do this:

$ hg clone https://bitbucket/org/ctheune/bandersnatch
$ cd bandersnatch
$ virtualenv-2.7 .
$ bin/python bootstrap.py
$ bin/buildout
$ bin/bsn-mirror /my/mirror/path

Cheers,
Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130320/ac78d7e7/attachment.html>


More information about the Catalog-SIG mailing list