[Python-Dev] Googlebot and the mail.python.org python-dev archive

A.M. Kuchling amk at amk.ca
Sat Feb 28 18:36:20 CET 2009


On Sat, Feb 28, 2009 at 09:53:10PM +1000, Nick Coghlan wrote:
> Is pydotorg-www still the place for website questions?* If so, I should
> probably take this over there...

Just 'pydotorg' is the current list
(http://mail.python.org/mailman/listinfo/pydotorg).

Looking at the access logs, mail.python.org is 
being actively crawled:

66.249.71.119 - - [28/Feb/2009:18:32:51 +0100] "GET /pipermail/python-list/2004-June/265194.html HTTP/1.1" 304 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
72.30.79.38 - - [28/Feb/2009:18:32:51 +0100] "GET /pipermail/csv/2003-February/000368.html HTTP/1.0" 200 3929 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)"
65.55.211.30 - - [28/Feb/2009:18:32:51 +0100] "GET /pipermail/python-list/2006-May/382528.html HTTP/1.1" 200 4028 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"

Is it maybe that the site is just too large, so the search engines
index only 10,000 messages from it?  One possible solution might be to
block crawling of the python-list archive; it's enormous, and already
available through Google's Usenet search.

--amk


More information about the Python-Dev mailing list