[Catalog-sig] b.pypi up-to-date again
"Martin v. Löwis"
martin at v.loewis.de
Thu Jul 14 21:17:05 CEST 2011
I fixed the AppEngine PyPI mirror (b.pypi.python.org), and it's
In case you are interested, here is the list of problems:
- integration of download stats broke on 2010-01-17.
The implementation creates a Download record every time somebody
downloads a file from the mirror, and integrates them once a day
into a daily report (deleting the Download objects, and creating
a Stats object). It combines all Download records except for the
ones of the current day, and deletes the one it combines (in chunks
of 100, since integrating all at once would exceed the 30s limit
Now, for Jan 17, integration completed (i.e. GAE said there weren't
any further Download objects left); however, the next day, another
Download object for the day showed up, breaking the integration
job. I fixed it by post-dating it to the next day that hadn't been
- mirroring broke when a file was deleted on PyPI after the mirror
learned of its existence, but before it got mirrored. Mirroring
works the way that
a) the GAE app asks PyPI to upload something (downloading from the
master doesn't work because it exceeds the maximum response
size for the url fetcher).
b) PyPI asks GAE to create an upload URL, and asks what file
c) PyPI uploads the file to the blob store
d) GAE tells the GAE app that the upload is complete
In the scenario, step c) failed, because the file wasn't there
anymore, so PyPI just ignored it. The GAE app's cron job then
restarted the upload, which would fail again - and so every
5 minutes since March 2010.
I fixed it by having PyPI upload a null-byte file, along
with a POST flag telling that the file is to be deleted
- restarting the mirroring then failed since accessing the
changelog since March took about 20s, exceeding the 5s XML-RPC
limit of GAE. I fixed it by introducing another RPC,
It then took three days to catch up, but should now stay up-to-date.
More information about the Catalog-SIG