Re: [Mailman-Developers] Protecting email addresses from spam harvesters

25 Feb 2002 · *will*

      On Mon, Feb 25, 2002 at 12:27:23PM -0500, Barry A. Warsaw wrote:
...
I /think/ I've caught up on this thread, but I'm sure I've missed a
bunch.  As I see it there are really these issues to protecting email
addresses in Mailman:

list admin addresses
public archives
private archives
raw archive
list rosters

I believe you've synopsized it correctly, yes.
...
For #1, MM2.1 changes what gets included at the bottom of list pages.
The admin's personal address is no longer included in the link's text
or in mailto: href.  In the mailto: you'll see something like
mylist-owner@dom.ain and in the text you'll see something like "barry
at zope.com".  I see no point in trying to obscure the former -- or
put it behind a web form -- because it's easily guessed given a probe
of existing lists, as is every other list-related email address.  More
on protecting the -owner from spam below.  I claim that the
guessability is a feature, btw.
Concur.
...
You can argue that "barry at zope.com" isn't obfuscated enough, and
you might be right.  I'm against any image or JavaScript approach to
protecting these because I really do want to keep Mailman's web
interface as pedestrian as possible.  In principle I don't mind if
JavaScript or images are used, but they should never be the only way
to navigate a Mailman site.  Mailman must degrade gracefully for
browsers that either don't support these features or have them
disabled.  I'd do the same with cookies if I could figure out how to
do low-frustration-factor authentication without them.
And, of course, if it *will* degrade, then address-snarfers will figure
out how to *make* it degrade, so it's not worth doing in the first
place, at least not for *that* reason.
...
(Aside: I really really hate websites that are only viewable with
JavaScript on, and I often send a friendly ADA-ish noodge to webmaster
when I find such beasts, although it rarely does any good).
Hear hear!
...
MM3 will likely integrate admin addresses and list memberships into an
object called a "roster" (essentially just a list of email addresses).
This will let us define a pipeline for each roster, which could
include a spam filter that performs an action based on some criteria
(e.g. drop it, reject it, mark a header, etc.).  So we can do more
protection on the -owner address than we can do now (without
hacking).
I do see one problem here, and I don't know if you already address it
below.  [ looks ]  You don't; it's this: if the list-owner addresses go
through the MM machinery, as well, then they too can die if MM crashes
the wrong way.
This implies, as I believe has already been discussed, that the
*server* admin address must be publicly accessible, not be piped into
MailMan at all, and preferably, should actually not even be handled by
the same machine...  ("Single point of failure")
...
        Rosters and the improved user database will allow us to
actually equate admin email addresses with Real Names, so you could
conceivably see something like
List run by <a href="mailto:mylist-owner@dom.ain">Barry Warsaw</a>
at the bottom of the pages.  You'd be within your rights to argue that
end users never even need know who admins the list, but I think it
helps to avoid the "faceless droid" syndrome.
Concur *strongly*.
...
Mailman should avoid getting deeply into the spam detection and
prevention business, except for some really really basic stuff
(probably not much more or less than it does now).  It should
integrate well with external spam detection programs like SpamAssassin
or commercial equivalents.  E.g. if we always send the message through
SA, and the message gets some score, we could decide to hold messages
below say 5.0 on the Spamster Scale, discard anything about 5.0, etc.
That sounds good, and if there isn't already a "plugin" API for that,
we ought to give some thought to that...
...
As for #2, I'd go for the low-tech approach of simply discarding the
hostname part of the email address in all public archives.  Certainly
this is easy in the headers, and we'll have to decide whether we're
going to expend the resources to do body searches for email addresses,
and obfuscate those as well.  If people want to make contacts based on
some public archive message, they can email the list.  Until we've got
web-posting, I don't think it matters if they lose the full email
address in the public archives.
Well, personally, I don't ever assume that someone who posted a message
a year ago with 95% of the answer to my question is even *on* the list
anymore -- a situation I don't think you thought of -- but...
...
As for #3, I don't mind not obscuring the email addresses since a
login will be required.  If we think we don't trust the current
private archive login procedures to be secure against bots, then we
can fix that, but I don't see it as a high priority.
Concur.
...
#4 is interesting too.  I'm not against putting the raw archive behind
a turing-test, since I suspect that very few people will ever want
it.  It means that we won't be able to write an automated wget-ish
script to do off-site backups, but so be it.
Is there a difference between raw and private that I'm missing?  Do you
mean the mbox format files?
...
Things to note for #'s 2-4:

The Pipermail implementation has lots of well-known problems.  I'm
personally not willing to spend a lot of time fixing them, and I
still recommend Real Sites use a Real Archiver.  I've just thrown
the majority of the email obfuscation problems over the fence into
someone else's back yard <wink>.

:-)
...

Adding public archive obfuscation is fine and dandy for new messages
added to the archives but what about all the existing archived
messages?  Re-running Pipermail (i.e. bin/arch) to regenerate from
scratch has two significant drawbacks.  1) Message url's can change,
especially if you also fix broken From_ delimiters, and that in turn
breaks bookmarks, 2) on large mboxes, you simply can't do bin/arch
because of memory problems.

See above.  :-)
...

Someone needs to step up and "own" Pipermail if any of these
problems are going to be fixed, or if the obfuscation is going to
happen.

Not much danger of that, is there?
...

Remember that Pipermail itself is completely optional.  An API is
defined between Mailman and the archiver and that's all the
interaction they have.  Maybe the API needs to be more elaborate to
support obfuscation.  It definitely needs some changes if we ever
want to add some of the features I'd like to add (but that's
off-topic here).

Well, that's probably the best point yet: this isn't *MailMan's*
problem, except to the extent that we "recommend" Piper as out
archiver.
...

I'll note that one of the early design decisions for Pipermail was
that public archives should be vended directly from the file system
for performance reasons.  That decision may not be appropriate for
today's operations.  Certainly maintaining two static versions of
the pages isn't feasible, so I think you have to vend one or the
other (probably the obfuscated version) from a cgi.

No, but the performance reasons aren't as much of an issue now...
...
Nobody's even mentioned #5, which are available publically via the
"Visit Subscriber List" button, or the email command "who" to the
-request address.  If I were a spam harvester, I wouldn't even bother
with scanning the archives if either of these were publically
enabled.  When you turn them off, especially the former, just remember
that you've now made it much harder for Joe User to unsubscribe
themselves.  Catch 22.
<chuckle>
Not enough experience in the field, or I'd probably have mentioned that
already.
Cheers,
-- jra
Jay R. Ashworth                                                jra@baylink.com
Member of the Technical Staff     Baylink                             RFC 2100
The Suncoast Freenet         The Things I Think
Tampa Bay, Florida        http://baylink.pitas.com             +1 727 647 1274
"If you don't have a dream; how're you gonna have a dream come true?"
-- Captain Sensible, The Damned (from South Pacific's "Happy Talk")

Re: [Mailman-Developers] Protecting email addresses from spam harvesters

Jay R. Ashworth

Cheers, -- jra