[Spambayes] SpamBayes feedback

Shawn K. Hall shawn at 12pointdesign.com
Sat Oct 21 00:26:29 CEST 2006


Hi Scott,

> ...It has 90/15 for thresholds. I never changed the
> thresholds, nor have I played around with those numbers. I
> just didn't know know enough to have any idea what to
> change them to. Do you recommend some other settings?

The thresholds are exactly that. If you've added the percentage display
as a column in Outlook you can see the spam value that Spambayes assigns
to each message. If you look in your spam folder and the ones you
recover are generally 90-95 while the ones you can guarantee are spam
are 97+, change the threshold for 'spam' to 96 (from 90) and you'll have
a more effective spam filter. Likewise, if your 'good' messages that end
up in the possible folder have a mid-range spam value percentage
(30-40-ish), you can increase the threshold for the possible folder up
above that value in order to have good mail stay in your inbox.

Also, I think it's a bad idea to treat the spam and possible folders as
the same. They are not. In fact, using the same folder for both is
likely to cause significant problems in correct classification. There is
a setting in the spambayes manager on the training tab in the
'incremental training' section. There are two boxes to enable you to
train messages as spam/ham based on when you move it to or from the spam
folder. If you are using the same folder for spam and possible, you will
NEVER be able to mark messages in your possible folder as spam, since
the message won't be moved. Likewise, if a single message is returned to
the inbox from spam (vs possible) it effects the legitimacy of the
content more significantly, so the training of a message returned from a
'mixed' spam folder would have unreliable training value.

SpamBayes does require training. The more you put into it the more
effective it will be. If you have a cache of legitimate messages to
train on then you could increase the validity of the filtering siply by
running the automated training facility. It does not work as effectively
when the numbers are grossly out of whack (you have over double the
number of spam than ham).

Finally, most of the people that I've helped configure SpamBayes that
ever complain about the possible folder are ones that tend to leave mail
in it. This folder should be regularly (daily at least) emptied. Either
recover the mail or mark it as spam. DO NOT leave it in there. If you
do, SpamBayes will not have indexed the keys withing those messages
correctly to classify them as ham or spam, so future messages with those
terms, which are already in the mid-range, will not be able to be
classified correctly.


SpamBayes has worked perfectly for me for a couple years now. I love it.
It is only very recently that I've been having problems (within the last
month or two), and it is due to the way newer spam messages are written.
Considering the amount of spam I get, the number of spam messages that I
have to manually touch is inconsequential.

Regards,

Shawn K. Hall
http://12PointDesign.com/




More information about the SpamBayes mailing list