[Catalog-sig] PyPI and Wiki crawling
"Martin v. Löwis"
martin at v.loewis.de
Tue Aug 7 23:06:47 CEST 2007
I hope I have now solved the overload problem that massive
crawling has caused to the wiki, and, in consequence,
caused PyPI outage.
Following Laura's advice, I added Crawl-delay into robots.txt.
Several robots have picked that up, not just msnbot and slurp,
but also e.g. MJ12bot.
For the others, I had to fine-tune my throttling code, after
observing that the expensive URLs are those with a query string.
They now account for 3 regular queries (might have to bump this
to 5), so you can only do one of them every 6s.
For statistics of the load, see
I added accounting of moin.fcgi run times, which shows that
Moin produced 15% CPU load on average (PyPI 3%, Postgres 2%)
More information about the Catalog-SIG