[Catalog-sig] why is the wiki being hit so hard?
lac at openend.se
Sun Aug 5 07:59:20 CEST 2007
In a message of Sat, 04 Aug 2007 09:42:45 +0200, "Martin v. Löwis" writes:
>> If they do not respect them, then you can use this program:
>> http://danielwebb.us/software/bot-trap/ to catch them.
>> If you are doing this, Martin, use the German version instead:
>> because it has a few useful additions. I forget what now.
>> Most scrapers, these days, respect robots.txt which will make this
>> program useless for catching them. But some days you can get lucky.
>That would also be an idea. I'll see how the throttling works out;
>if it fails (either because it still gets overloaded - which shouldn't
>happen - or because legitimate users complain), I'll try that one.
pardon for this completely useless quoting of irrelevant text
but I tried just telling catalog-sig to go read this url
and check MSNbot is crawling my site too frequently.
and i got suspiciopus header, which is what all the python.org
groups say when they think you are sendng them spam, and not
in the header. So if your text is basically a url, and you
want to send it to a python.org group you are screwed. So I
find an article and reply.
Go read that.
I think it says that we could set our crawl delay to some number
-- why 120 I have no clue -- and our spider will be made
behave. Or possibly we can hack the bot trap for those as not
at any rate seems relevant to our problem
More information about the Catalog-SIG