[Spambayes] deployment for mailman lists

Mon Nov 4 22:54:24 2002

>     Guido> But the key is that *you* are the list's main
>     Guido> administrator and in charge of the initial setup.  So
>     Guido> *you* should set it up to minimize your pain (which
>     Guido> includes constant worries about lost mail due to false
>     Guido> positives in the spam filter).
> 
> Correct, but regardless of my abilities in this particular case, the
> *default* for new mailing lists - those created by
> ~mailman/bin/newlist - should be to not delete the spam.  The
> administrator of the site has to run that.  The moderator of the
> list (who generally won't have shell access to the machine running
> Mailman) will then get her chance to go through and fiddle the bits.

The default should be not to enabl spambayes filtering at all, since
there's no way to set up the training data to begin with.

>     Guido> I believe that while Mailman is relatively easy to set
>     Guido> up, it requires (at least) typical mail admin skills, and
>     Guido> a mail admin already has in his/her head ideas about the
>     Guido> cost of lost mail.  You seem to have been burned by this,
>     Guido> and as a consequence I believe you're on the conservative
>     Guido> side.  As long as the consequences are clear when a list
>     Guido> admin chooses to enable spam filtering, I think the
>     Guido> default should be for convenience, not for liability.
> 
> It has nothing to do with getting burned, I just have relevant
> current experience dealing with less technical lists.  There are
> tons of non-technical folks out there running Mailman-managed
> mailing lists.  Consider that many hosting companies like Hostway
> make this available to their customers.  Every other mail-handling
> tool I've ever seen (sendmail, fetchmail, procmail, etc) goes to
> great lengths to avoid losing mail.  Why shouldn't Mailman?

See above.  Enabling spam filtering should be an explicit step.  The
UI should clarify the consequences and show the configuration
settings.  But the default configuration settings *once spam filtering
is enabled* should be to bounce (not drop) spam scoring higher than
the top of the "uncertain" region.  Example UI:

   [ ] Enable Baysian spam filtering [help link]

       [ 95 ] Spam cutoff score
       [  5 ] Ham cutoff score

       Disposition for messages scoring at least spam cutoff:
       (x)  Bounce
       ( )  Discard
       ( )  Moderate

       Disposition for messages scoring between ham and spam cutoff:
       ( )  Moderate
       (x)  Approve

       <more config options, in particular where to get the ham
       training data>

>     Guido> There's no way you can design a web moderation interface
>     Guido> to deal well with manually moderating 200 spams per day.
>     Guido> IMO if you show *all* spam in the moderation interface,
>     Guido> the kind of non-techie moderator that you describe is
>     Guido> *more* likely to make mistakes (rejecting ham or
>     Guido> approving spam) than in the default that I propose.
> 
> I'm not saying that you have to design an interface to deal with
> moderating 200 spams a day.  I'm also not saying it's a
> one-time-only setting.  Still, by making the default for held spam
> messages be "discard" instead of "defer", Mailman could make it a
> one-click operation to delete all 200 with one "Submit All Data"
> click from the moderation interface.  I haven't used Mailman 2.1
> yet, but I think that was something Barry had hoped to make a
> configuration option as well.

And that's exactly what I fear -- mixing the spam and unsure messages
in a single moderation queue will increase mistakes.

>     Guido> You've made this same (or a very similar) point many
>     Guido> times, and while I agree with you that it's bad to delete
>     Guido> spam in many setups, I strongly disagree in this case.
> 
> Only because you seem to continually misunderstand what I'm saying.
> I am *only* saying it's bad to delete spam by default when the list
> is first created.  Let the list moderator decide, "I can't handle
> all this crap, please delete it for me".

OK, then we agree.  I say spam filtering shouldn't be enabled at all
when the list is created -- after all you have no ham training data!

> I see two scenarios:
> 
>     1.  An existing mailing list is converted to a new
>         Mailman+Spambayes setup.  The moderator is either (a)
>         thankful that all the spam which had previously shown up on
>         the list is now somewhere he can deal with it, or (b) he was
>         already doing something to deflect most/all the spam, so
>         doesn't see much of it in the moderation interface.

Depends on whether whatever he was doing before can be ported to the
MM setup.

>     2. A brand new mailing list is setup with Mailman+Spambayes.  As
>        a new list, it should not be getting 200 spams per day.  The
>        moderator will have time to figure out how to change the
>        settings on the list to delete spam instead of hold it.
> 
> I just don't understand why you have a hard time understanding that
> out-of-the-box Mailman+Spambayes should not delete spam.  It's a
> one-click change for Greg or Barry, or whoever controls python-list.
> Why not err on the side of caution?

It was all a big misunderstanding.

--Guido van Rossum (home page: http://www.python.org/~guido/)