[Mailman-Developers] Protecting email addresses from spam harvesters

Jay R. Ashworth jra@baylink.com
Tue, 26 Feb 2002 12:36:49 -0500

On Tue, Feb 26, 2002 at 12:56:45AM -0500, Barry A. Warsaw wrote:
>     JRA> I do see one problem here, and I don't know if you already
>     JRA> address it below.  [ looks ] You don't; it's this: if the
>     JRA> list-owner addresses go through the MM machinery, as well,
>     JRA> then they too can die if MM crashes the wrong way.
>     JRA> This implies, as I believe has already been discussed, that
>     JRA> the *server* admin address must be publicly accessible, not
>     JRA> be piped into MailMan at all, and preferably, should actually
>     JRA> not even be handled by the same machine...  ("Single point of
>     JRA> failure")
> Well, what machine it's handled by isn't Mailman's business, but you
> do have a point.  Until recently, I recommended that you install
> aliases `mailman' and `mailman-owner', but now I recommend that
> `mailman' be an actual list, and it is from this list that things like
> password reminders look to come from.  Also, if the site list gets a
> bounce, it'll check all the existing lists for a match against the
> bouncing address.


> You make the valid point that if the Mailman system were to break,
> you'd have no way to contact the site administrator, save for typical
> aliases like postmaster.  It seems like you want:
> - A non-list, plain alias to contact the human in case of emergency

Yep; and it's fine if this is an alias; I agree with Chuq's opinion
about 'Real people", but I don't mind *sending* to a role account, as
long as the *reply* comes from a human, with a .sig file.

> - Some place that password reminders come from.  Since this will be
>   receiving bounces, it ought to be a real list.

Yeah, probably.

> - A site-wide list of maintainers of the site who can take care of
>   normal operations (i.e. panicky unsubscription requests).
> Perhaps #3 can be the same as #1 for those sites that have a
> collaborative management arrangement.  So the question is, what do we
> call the alias and what do we call the list?  I have definitely seen
> people try to send mail commands to `mailman@python.org' and from my
> Majordomo days, this seems like a reasonable thing to (eventually)
> implement.  Is it sufficient to recommend that postmaster@ point to a
> real human, not a list, and leave mailman@dom.ain a normal list?

Hmmm...  I see the problem: mailman is the obvious alias for the server
admin, but I also see why you want to leave it a list.

*I* think that postmaster@ the mailing list machine (or domain) is a
good enough answer, but I think Chuq will accuse me of geeking out
again, and on this one, I'm afraid I'd agree with him.

The number of people on the net with *no* indoctrination at all is
truly stunning.

> If not, i'd still opt for `mailman' to be the site list, and add
> something like mailman-panic to be a human address.  Or perhaps make
> mailman-owner pipe both to the wrapper and to postmaster.  I dunno,
> I'm open to suggestions.

Well, here's the problem: it has to be predictable, because

1) you can't put in every mail footer cause you don't *want* people
using it unless something goes Horribly Wrong, but

2) if something *does* go Horribly Wrong, they won't be *able* to get
it from the website...

> > Mailman should avoid getting deeply into the spam detection and
> > prevention business, except for some really really basic stuff
> > (probably not much more or less than it does now).  It should
> > integrate well with external spam detection programs like SpamAssassin
> > or commercial equivalents.  E.g. if we always send the message through
> > SA, and the message gets some score, we could decide to hold messages
> > below say 5.0 on the Spamster Scale, discard anything about 5.0, etc.
>     JRA> That sounds good, and if there isn't already a "plugin" API
>     JRA> for that, we ought to give some thought to that...
> Agreed.  I just have no idea what a reasonable API would be, although
> we're planning on doing some experiments with SA on {python,zope}.org
> to see what might make sense.

I suspect that, at least for Unixy installs, a system call to the
appropriate binary, with percentized arguments, will fill the bill
nicely; you can catch the exit value -- and if your package doesn't do
it that way, you can write a script to parse the output and send a
return value.

I, personally, would re-read the message from the file I put it in, in
case someone's package (wants to) rewrite the MIME to remove and
quarantine suspicious attachments.

> > #4 is interesting too.  I'm not against putting the raw archive behind
> > a turing-test, since I suspect that very few people will ever want
> > it.  It means that we won't be able to write an automated wget-ish
> > script to do off-site backups, but so be it.
>     JRA> Is there a difference between raw and private that I'm
>     JRA> missing?  Do you mean the mbox format files?
> Yup.  raw == mbox.

Ok.  I've often found it quite useful to snarf those down for lists I'm
not on (yet); I wouldn't mind having to prove I was human, though.

My real problem was just that the obfuscation breaks Google, and since
"Get the glue right" is one of my loudest systems-design mantras...

>     JRA> Well, that's probably the best point yet: this isn't
>     JRA> *MailMan's* problem, except to the extent that we "recommend"
>     JRA> Piper as out archiver.
> I don't know if I recommend it, in fact I try to dis-recommend it.

Sounds like a good call to me...

> Still, I think we do more good than harm in distributing an archiver
> that works out of the box.  And the advantage of Pipermail is that for
> really really critical problems, we /can/ go in and hack on it.  I'm
> torn, but still come down on the side of including Pipermail, even
> with all its worts.

Until Zest is a solution...

> > - I'll note that one of the early design decisions for Pipermail was
> >   that public archives should be vended directly from the file system
> >   for performance reasons.  That decision may not be appropriate for
> >   today's operations.  Certainly maintaining two static versions of
> >   the pages isn't feasible, so I think you have to vend one or the
> >   other (probably the obfuscated version) from a cgi.
>     JRA> No, but the performance reasons aren't as much of an issue
>     JRA> now...
> Nope.

Optimizing for performance in the core design of a system is nearly
always a bad idea, at least on this end of the performance curve.

If you're redesigning Amadeus, or SABRE; perhaps not.

-- jra
Jay R. Ashworth                                                jra@baylink.com
Member of the Technical Staff     Baylink                             RFC 2100
The Suncoast Freenet         The Things I Think
Tampa Bay, Florida        http://baylink.pitas.com             +1 727 647 1274

   "If you don't have a dream; how're you gonna have a dream come true?"
     -- Captain Sensible, The Damned (from South Pacific's "Happy Talk")