[Spambayes] ham message training

Tony Meyer tameyer at ihug.co.nz
Mon Mar 7 04:44:47 CET 2005


> SpamBayes has processed 4603 messages for me so far and I'm 
> wondering if I should be discarding those e-mails I receive
> everyday that are good instead of training them as "ham."

Not enough research has been done yet into training to really give a
definitive answer.  However, it is fairly certain that SpamBayes works best
with roughly equal numbers of ham and spam, so if training this ham is
causing an imbalance, then it's worth considering changing.

There's a lot of information about training available here:

<http://entrian.com/sbwiki>

The golden rule is really that if things are working, then you don't need to
change anything.

> I'm concerned that the ham database is growing too large 
> unless, of coarse, the program knows to automatically discard
> new "hams" that match previous hams even though I'm still adding
> daily hams to the training.

sb_server trains what you give it to train - it doesn't do any sort of
analysis about whether it should do the training or not.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.



More information about the Spambayes mailing list