[Spambayes] SpamBayes now filers less than 50% of my spam.
Tim Peters
tim.one at comcast.net
Sun Nov 16 00:12:01 EST 2003
[Skip Montanaro]
> ...
> I've developed a few seat-of-the-pants training maxims, both from
> personal experience and from reading what others have done:
>
> * Don't be afraid to retrain from scratch. The system learns
> quickly. Retraining from scratch is often the quickest way to
> recover from training mistakes.
>
> * Bigger is not always better, no matter what all those
> enlargement messages would have you believe. A larger database
> is harder to examine for mistakes, and a few mistakes skewed in
> the same directionn may be hard to overcome with correct
> training. You'll also reach a point where you want to just
> delete all that spam. Once you do that, you've completely lost
> the ability to find mistakes. If you only have a few messages
> in your training database things will be easier to manage.
>
> * Never train on the same message twice. Using iterative
> reasoning it's easy see you should never train on the same 100
> or 1000 times either. ;-)
>
> * Seek balance in your training database. Similar numbers of ham
> and spam are good.
>
> * Don't automatically train on all incoming messages. If you get
> swamped with spam, you will quickly wind up with a training
> database which is wildly out-of-balance.
>
> * Don't worry about training on every unsure message either. Some
> messages just aren't amenable to a strict classification. For
> example, a bounce message from a mail server containing an
> attached spam may be best left untrained. It contains both
> strong ham clues (all the postmaster gibberish which you would
> get in a bounce of an otherwise valid message) and strong spam
> clues (the spam message itself). Calling that message as ham
> or spam is likely to worsen the classification of future mail
> bounces or future similar spam.
Those are excellent suggestions, Skip! How about immortalizing them on
Richie's SpamBayes Wiki? (Good info in email msgs is rarely found again.)
> My environment is much different than yours, so I don't know how
> you'd get the Outlook plugin to score messages again, but if it can
> do that, a little judicious checking will probably avoid the need to
> over-train.
In current incarnations of the addin, it's easy, but has moved from where it
was found in earlier versions of the addin
Spambayes ->
Filter messages ...
That brings up a dialog box with 4 sets of choices:
+ A multi-folder selector to choose the folder(s) you want to filter
(just the Unsure folder in this application of it).
+ A radio button to choose whether you *just* want msgs rescored, or
also want them shuffled around to other folders based on your other
SpamBayes settings. (You do want them shuffled around in this
application of it.)
+ Whether to rescore email that's already been scored. (Yes, in this
application of it.)
+ Whether to restrict rescoring to unread mail. (Depends on what
else is in your Unsure folder, I suppose.)
More information about the Spambayes
mailing list