[Spambayes] Do you need to continue training ham?

Tim Peters tim.one at comcast.net
Thu Sep 18 19:38:13 EDT 2003


[Rob Rosenfeld]
> Hey folks.  I have moved from SpamAssassin to the SpamBayes Outlook
> plug-in. The integration is great.  I'm a bit confused about one
> part.  I had stockpiles of ham and spam to initially train SpamBayes
> with.

Note that spambayes works best if you train on an approximately equal number
of each.  It doesn't take millions <wink>, either.  For example, I started
this project, and my home Outlook classifier still hasn't been trained on
2000 messages total (I get about 600 per day, and my classifier database is
going on one year old, so I've trained on less than 1% of the email I've
received in that time).

> If I understand correctly, every time SpamBayes detects and
> moves a spam, it trains on it, kind of giving it "ongoing" spam
> training.   Is that correct?

Nope.

> If it doesn't move it as spam, does it train on it as ham?

Not that either.  It auto-trains on messages for which you explicitly click
the "Recover from Spam" or "Delete as Spam" buttons.  In addition, it *may*
train on messages *you* move to spam or ham folders, depending on which
boxes you've checked in the spambayes Manager's Training tab, section
"Incremental Training".  These aren't necessarily ideal training protocols,
but they're the best we've been able to implement so far that most users
seem able to deal with.  Ideal would be to train on a small random sample of
all the email you get, and expire training messages over time too.  That
seems hard.




More information about the Spambayes mailing list