[Mailman-Developers] Protecting email addresses from spam harvesters

Jay R. Ashworth jra@baylink.com
Mon, 25 Feb 2002 13:14:56 -0500

On Mon, Feb 25, 2002 at 12:27:23PM -0500, Barry A. Warsaw wrote:
> I /think/ I've caught up on this thread, but I'm sure I've missed a
> bunch.  As I see it there are really these issues to protecting email
> addresses in Mailman:
> 1) list admin addresses
> 2) public archives
> 3) private archives
> 4) raw archive
> 5) list rosters

I believe you've synopsized it correctly, yes.

> For #1, MM2.1 changes what gets included at the bottom of list pages.
> The admin's personal address is no longer included in the link's text
> or in mailto: href.  In the mailto: you'll see something like
> mylist-owner@dom.ain and in the text you'll see something like "barry
> at zope.com".  I see no point in trying to obscure the former -- or
> put it behind a web form -- because it's easily guessed given a probe
> of existing lists, as is every other list-related email address.  More
> on protecting the -owner from spam below.  I claim that the
> guessability is a feature, btw.


> You can argue that "barry at zope.com" isn't obfuscated enough, and
> you might be right.  I'm against any image or JavaScript approach to
> protecting these because I really do want to keep Mailman's web
> interface as pedestrian as possible.  In principle I don't mind if
> JavaScript or images are used, but they should never be the only way
> to navigate a Mailman site.  Mailman must degrade gracefully for
> browsers that either don't support these features or have them
> disabled.  I'd do the same with cookies if I could figure out how to
> do low-frustration-factor authentication without them.

And, of course, if it *will* degrade, then address-snarfers will figure
out how to *make* it degrade, so it's not worth doing in the first
place, at least not for *that* reason.

> (Aside: I really really hate websites that are only viewable with
> JavaScript on, and I often send a friendly ADA-ish noodge to webmaster
> when I find such beasts, although it rarely does any good).

Hear hear!

> MM3 will likely integrate admin addresses and list memberships into an
> object called a "roster" (essentially just a list of email addresses).
> This will let us define a pipeline for each roster, which could
> include a spam filter that performs an action based on some criteria
> (e.g. drop it, reject it, mark a header, etc.).  So we can do more
> protection on the -owner address than we can do now (without
> hacking).

I do see one problem here, and I don't know if you already address it
below.  [ looks ]  You don't; it's this: if the list-owner addresses go
through the MM machinery, as well, then they too can die if MM crashes
the wrong way.

This implies, as I believe has already been discussed, that the
*server* admin address must be publicly accessible, not be piped into
MailMan at all, and preferably, should actually not even be handled by
the same machine...  ("Single point of failure")

>             Rosters and the improved user database will allow us to
> actually equate admin email addresses with Real Names, so you could
> conceivably see something like
>     List run by <a href="mailto:mylist-owner@dom.ain">Barry Warsaw</a>
> at the bottom of the pages.  You'd be within your rights to argue that
> end users never even need know who admins the list, but I think it
> helps to avoid the "faceless droid" syndrome.

Concur *strongly*.

> Mailman should avoid getting deeply into the spam detection and
> prevention business, except for some really really basic stuff
> (probably not much more or less than it does now).  It should
> integrate well with external spam detection programs like SpamAssassin
> or commercial equivalents.  E.g. if we always send the message through
> SA, and the message gets some score, we could decide to hold messages
> below say 5.0 on the Spamster Scale, discard anything about 5.0, etc.

That sounds good, and if there isn't already a "plugin" API for that,
we ought to give some thought to that...

> As for #2, I'd go for the low-tech approach of simply discarding the
> hostname part of the email address in all public archives.  Certainly
> this is easy in the headers, and we'll have to decide whether we're
> going to expend the resources to do body searches for email addresses,
> and obfuscate those as well.  If people want to make contacts based on
> some public archive message, they can email the list.  Until we've got
> web-posting, I don't think it matters if they lose the full email
> address in the public archives.

Well, personally, I don't ever assume that someone who posted a message
a year ago with 95% of the answer to my question is even *on* the list
anymore -- a situation I don't think you thought of -- but...

> As for #3, I don't mind not obscuring the email addresses since a
> login will be required.  If we think we don't trust the current
> private archive login procedures to be secure against bots, then we
> can fix that, but I don't see it as a high priority.


> #4 is interesting too.  I'm not against putting the raw archive behind
> a turing-test, since I suspect that very few people will ever want
> it.  It means that we won't be able to write an automated wget-ish
> script to do off-site backups, but so be it.

Is there a difference between raw and private that I'm missing?  Do you
mean the mbox format files?

> Things to note for #'s 2-4:
> - The Pipermail implementation has lots of well-known problems.  I'm
>   personally not willing to spend a lot of time fixing them, and I
>   still recommend Real Sites use a Real Archiver.  I've just thrown
>   the majority of the email obfuscation problems over the fence into
>   someone else's back yard <wink>.


> - Adding public archive obfuscation is fine and dandy for new messages
>   added to the archives but what about all the existing archived
>   messages?  Re-running Pipermail (i.e. bin/arch) to regenerate from
>   scratch has two significant drawbacks.  1) Message url's can change,
>   especially if you also fix broken From_ delimiters, and that in turn
>   breaks bookmarks, 2) on large mboxes, you simply can't do bin/arch
>   because of memory problems.

See above.  :-)

> - Someone needs to step up and "own" Pipermail if any of these
>   problems are going to be fixed, or if the obfuscation is going to
>   happen.

Not much danger of that, is there?

> - Remember that Pipermail itself is completely optional.  An API is
>   defined between Mailman and the archiver and that's all the
>   interaction they have.  Maybe the API needs to be more elaborate to
>   support obfuscation.  It definitely needs some changes if we ever
>   want to add some of the features I'd like to add (but that's
>   off-topic here).

Well, that's probably the best point yet: this isn't *MailMan's*
problem, except to the extent that we "recommend" Piper as out

> - I'll note that one of the early design decisions for Pipermail was
>   that public archives should be vended directly from the file system
>   for performance reasons.  That decision may not be appropriate for
>   today's operations.  Certainly maintaining two static versions of
>   the pages isn't feasible, so I think you have to vend one or the
>   other (probably the obfuscated version) from a cgi.

No, but the performance reasons aren't as much of an issue now...

> Nobody's even mentioned #5, which are available publically via the
> "Visit Subscriber List" button, or the email command "who" to the
> -request address.  If I were a spam harvester, I wouldn't even bother
> with scanning the archives if either of these were publically
> enabled.  When you turn them off, especially the former, just remember
> that you've now made it much harder for Joe User to unsubscribe
> themselves.  Catch 22.


Not enough experience in the field, or I'd probably have mentioned that

-- jra
Jay R. Ashworth                                                jra@baylink.com
Member of the Technical Staff     Baylink                             RFC 2100
The Suncoast Freenet         The Things I Think
Tampa Bay, Florida        http://baylink.pitas.com             +1 727 647 1274

   "If you don't have a dream; how're you gonna have a dream come true?"
     -- Captain Sensible, The Damned (from South Pacific's "Happy Talk")