[Catalog-sig] PyPI outage

"Martin v. Löwis" martin at v.loewis.de
Thu Aug 2 22:38:54 CEST 2007


I think I now understand what happened with the outage of PyPI
yesterday and today.

As Thomas found, somebody was crawling the wiki, with multiple
requests per second, all links (e.g. in a series such as

/moin/PyConFrancescAlted?action=AttachFile
/moin/PyConFrancescAlted?action=diff
/moin/PyConFrancescAlted?action=info
/moin/PyConFrancescAlted?action=edit
/moin/PyConFrancescAlted?action=LocalSiteMap
/moin/PyConFrancescAlted?action=print
/moin/PyConFrancescAlted?action=refresh

and so on, for every page. That caused considerable load on
the machine (load average 17).

In turn, PyPI began to respond more slowly; in some cases, it
would not respond within the 60s that I configured for
FastCGI. As a result, mod_fastcgi would close the connection
for the request (and log an error). thfcgi.py found that
it can't write to the pipe anymore (EPIPE), and therefore
decided to terminate the FCGI server.

In turn, mod_fastcgi attempted to restart the server for some
time, and eventually would start throttling the restarts,
making all PyPI servers go away (i.e. they would quit, and
then not get restarted for some time).

At that point, my maintenance script would detect that all
PyPI instances went away, and initiate a graceful restart
of Apache.

The crawler comes from the same ISP, but today with a
different IP address. I blocked that address as well.

Can anybody suggest a more reliable way to prevent crawlers
from hitting the wiki so hard?

Regards,
Martin


More information about the Catalog-SIG mailing list