[Spambayes] Training using the Outlook Plug-In

Kenny Pitt kennypitt at hotmail.com
Mon Nov 17 11:57:55 EST 2003


Jeff Jansen wrote:
> My question is about the messages that it successfully identifies as
> spam or ham during use based on the current status of the training
> database. Does SB learn from those messages as well, or does it
> merely move the message into the appropriate folder?

No, SB does not learn automatically from messages that it classifies.
It only trains when you manually identify a message as one or the other.

> ... Maybe the real
> question is, is there anything for SB to learn from such messages?
> I'm thinking that whatever the "pattern" that causes a message to be
> correctly identified as spam or ham, the database would be
> "strengthened" by the knowledge that there is one more message that
> matches that pattern, so that there IS something for SB to learn from
> those messages. So the question is, is that what SB does? 

There is a reasonable argument that training on spam messages that have
already been classified as spam can increase the chances of catching
future spam as the messages mutate.  However, training on *everything*
has the potential to cause your training data to have significantly more
of one type of message than the other.  There are definate accuracy
penalties when your training is out of balance.

> The bottom line is, if I'm basically happy with the way SB is
> working, do I need to periodically forcibly retrain it (using the
> "Start Training" button) and make it look at all the messages that
> have arrived since the last forcible training, or are those messages
> already reflected in the training database?

IMHO, this really isn't necessary unless your training data gets out of
balance.  Then you can do training with the "Rebuild entire database"
option turned off to get back in balance.  In this case, you'll need to
choose your training folders carefully so that you only increase the
type of messages that you have too few of.

-- 
Kenny Pitt




More information about the Spambayes mailing list