[Spambayes] SpamBayes now filers less than 50% of my spam.

Tim Peters tim.one at comcast.net
Sun Nov 16 00:12:01 EST 2003


[Skip Montanaro]
> ...
> I've developed a few seat-of-the-pants training maxims, both from
> personal experience and from reading what others have done:
>
>     * Don't be afraid to retrain from scratch.  The system learns
>       quickly. Retraining from scratch is often the quickest way to
>       recover from training mistakes.
>
>     * Bigger is not always better, no matter what all those
>       enlargement messages would have you believe.  A larger database
>       is harder to examine for mistakes, and a few mistakes skewed in
>       the same directionn may be hard to overcome with correct
>       training.  You'll also reach a point where you want to just
>       delete all that spam.  Once you do that, you've completely lost
>       the ability to find mistakes.  If you only have a few messages
>       in your training database things will be easier to manage.
>
>     * Never train on the same message twice.  Using iterative
>       reasoning it's easy see you should never train on the same 100
>       or 1000 times either. ;-)
>
>     * Seek balance in your training database.  Similar numbers of ham
>       and spam are good.
>
>     * Don't automatically train on all incoming messages.  If you get
>       swamped with spam, you will quickly wind up with a training
>       database which is wildly out-of-balance.
>
>     * Don't worry about training on every unsure message either.  Some
>       messages just aren't amenable to a strict classification.  For
>       example, a bounce message from a mail server containing an
>       attached spam may be best left untrained.  It contains both
>       strong ham clues (all the postmaster gibberish which you would
>       get in a bounce of an otherwise valid message) and strong spam
>       clues (the spam message itself).  Calling that message as ham
>       or spam is likely to worsen the classification of future mail
>       bounces or future similar spam.

Those are excellent suggestions, Skip!  How about immortalizing them on
Richie's SpamBayes Wiki?  (Good info in email msgs is rarely found again.)

> My environment is much different than yours, so I don't know how
> you'd get the Outlook plugin to score messages again, but if it can
> do that, a little judicious checking will probably avoid the need to
> over-train.

In current incarnations of the addin, it's easy, but has moved from where it
was found in earlier versions of the addin

    Spambayes ->
        Filter messages ...

That brings up a dialog box with 4 sets of choices:

+ A multi-folder selector to choose the folder(s) you want to filter
  (just the Unsure folder in this application of it).

+ A radio button to choose whether you *just* want msgs rescored, or
  also want them shuffled around to other folders based on your other
  SpamBayes settings.  (You do want them shuffled around in this
  application of it.)

+ Whether to rescore email that's already been scored.  (Yes, in this
  application of it.)

+ Whether to restrict rescoring to unread mail.  (Depends on what
  else is in your Unsure folder, I suppose.)




More information about the Spambayes mailing list