[Spambayes] Marking message flagged as spam as non spam

Thomas Hruska thruska at cubiclesoft.com
Tue Apr 7 07:10:15 CEST 2009

skip at pobox.com wrote:
>     Keith> How many hams and spams have you trained on? 
>     Keith> -Quite a few , around 350 spam mails, hams around 4500.
> This is way out-of-balance.  Typically SpamBayes works best with roughly
> equal numbers of ham and spam.

While I agree that this is out of balance, Spambayes seriously needs to 
get its act together and stop allowing users to train on imbalances or 
messages classified correctly and allows users to reset the database 
periodically (the POP3 proxy server seriously needs a feature that 
allows you to do a complete reset of the database within the UI itself).

The rule of thumb I follow is:  Train on only one spam in ham and one 
ham in unsure.  Skip training on messages I plan on filtering using my 
e-mail client (i.e. no point in training on messages I'm going to 
whitelist in the first place).  Once I reach about 300 of each type, 
reset the database and start over.

My problem is that 99.9% of my incoming mail is spam, so there is an 
imbalance by default.  I am forced to delete unsures because the 
imbalance is so great.  IMO, 'unsure' is an inappropriate word choice 
for the category.  It causes many users to feel they need to tell 
Spambayes what is ham and spam.  This, in turn, creates the imbalances 
they then experience.

When was the last update to Spambayes?  Time for a new version!

Thomas Hruska
CubicleSoft President
Ph: 517-803-4197

*NEW* MyTaskFocus 1.1
Get on task.  Stay on task.


More information about the SpamBayes mailing list