[Catalog-sig] PyPI overloaded(?)

Richard Jones richardjones at optushome.com.au
Wed Oct 18 06:43:49 CEST 2006

Sorry I didn't respond in a more immediate manner - I'm quite busy with work 
and organising papers for OSDC '06.

On Wednesday 18 October 2006 04:08, Martin v. Löwis wrote:
> I'm not so sure this is the definite answer. If the system is
> overloaded, it might be because it consumes too many resources itself
> (in which case mirroring wouldn't help), or because something else
> on the machine is consuming too many resources (in which case the
> installation should be moved elsewhere entirely).

We still have the problem that the PyPI browse interface is quite 
CPU-intensive and if it's hit by a bot it'll definitely impact on overall 
system performance.

We have a check in the browse code to see if the user agent matches:

botre = re.compile(r'^$|brains|yeti|myie2|findlinks|ia_archiver|psycheclone|
badass|crawler|slurp|spider|bot|scooter|infoseek|looksmart|jeeves', re.I)

and if it does then the browse returns an empty page. This RE is pretty 
complete - I use it to redirect bots to a dedicated ZEO client at work.

I've added a robots.txt to http://cheeseshop.python.org (I always meant to, 
but never got around to it). Unfortunately, I'm not 
sure "Disallow: /pypi?:action=browse" will be handled properly. We'll see.


More information about the Catalog-sig mailing list