[Mailman-Users] Harvesting of email addresses for spamfromarchives

Mark Sapiro mark at msapiro.net
Mon Sep 8 22:03:49 CEST 2008


David Beaumont wrote:

>I've just checked myself and the HTML source still seems to allow robots:
><META NAME="robots" CONTENT="index,nofollow"> on each message and <META
>NAME="robots" CONTENT="noindex,follow"> on the index page.  I would want
>noindex and nofollow on both pages.  


The META tags say don't 'index' the index pages themselves, but follow
the links to the messages, and on the message pages 'index' the
contents, but don't follow links.

This is appropriate for a public archive. Presumably you want people to
be able to search for and find stuff in a public archive.


>Changing to private archives doesn't seem to make any difference to that,
>does it only apply to new archiving?  The help is a bit vague here, does
>public mean the data is prepared for public posting  (emails obscufacted)
>and private mean they are not, or does private mean they are not put on the
>web? i.e. which of private and public is actually the most secure?


If the archive is private, it can only be accessed by logging in. The
'pipermail' URL doesn't work and access is via the 'private' CGI which
requires login with list member email address and password.

It won't change the META tags, but they will be irrelevant since robots
can't log in and access the pages.

Email obfuscation has nothing to do with public/private. It depends
solely on the setting on ARCHIVER_OBSCURES_EMAILADDRS in
Defaults.py/mm_cfg.py.

If an archive is public, archive URLs are of the form
http://example.com/pipermail/list/ and anyone can access them. If an
archive is private, the URL form becomes
http://example.com/mailman/private/list/ and login is required to
access the archive. The 'private' URL will work with a public archive,
but still requires login. The 'pipermail' URL will not work with a
private archive.

Switching an archive from public to private changes the archive URL on
the listinfo page and removes the list's symlink(s) from
archives/public/ (which the pipermail alias depends on). Switching
from private to public does the reverse.

Private archives are accessable only to list members by login with
their email address and list password, or to list admins/moderators
with the list admin/moderator password. They are as secure as the
passwords. They will not be accessed by spambots or search engine web
crawlers because those can't log in.


>Also search engines still seem to be able to see the data e.g. type 
>"neoprene site:lists.shire.net/pipermail/dbamain/ " into Google, maybe this
>will go in a few days?


Your requiring of web server authentication to accesss the public
archive will keep all the robots out, but it will be much longer than
a few days before their existing indexing of your archive and cached
pages go away, but they won't be getting any new posts in the mean
time.

-- 
Mark Sapiro <mark at msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan



More information about the Mailman-Users mailing list