[Mailman-Developers] Proposed: remove address-obfuscation codefrom Mailman 3
CNulk at scu.edu
Mon Aug 31 22:54:06 CEST 2009
Mark Sapiro wrote:
> Barry Warsaw wrote:
>> On Aug 31, 2009, at 1:15 PM, C Nulk wrote:
>>> As for using robots.txt, hmm, it is not the legitimate search
>>> engines I
>>> care about, it is the search engines/crawlers that do not respect my
>>> robots.txt file that I care about. If I had an effective way to
>>> consistently identify those non-legitimate crawlers, I would add
>>> what I
>>> needed to drop them into my firewall as I recognized them.
> The point in the original post about robots.txt was that if you think
> obfuscation is undesirable and don't do it, but you get complaints
> from people who find their unobfuscated addresses on your pages via
> legitimate search engines, you can use robots.txt to keep the search
> engines out.
I understood the original post and I agree.
> However, robots.txt is not completely effective in this. You can use it
> to prevent Google from crawling your site or portions thereof, but it
> won't prevent Google from indexing your pages that it finds via
> external links. To prevent this, you need a <meta name="robots"
> content="noindex"> tag on the pages themselves.
I agree with you here.
The robots.txt and the "<meta" html header work great for search engines
that respect those conventions. My point is that neither of them are
effective for crawlers that do not respect the conventions. By putting
raw email address in the archives without a means to obfuscate them
simple hands over the addresses to those disreputable crawlers. And, if
I was writing a web crawler to harvest email addresses, I am pretty sure
I would ignore convention which stops me from getting what I want. BTW,
I DON'T WRITE WEB CRAWLERS so no yelling at me. :)
It is those disreputable crawlers I was addressing in my comment -
robots.txt and the "<meta" header are insufficient in that particular case.
More information about the Mailman-Developers