[Spambayes] Training Question - Status configuration

Gary Smith Gary at doctorgary.net
Mon May 8 19:28:49 CEST 2006


Question regarding the Ham vs Spam ratio

As I get many messages daily thanks to multiple active 
lists I belong to, I do get far more Ham than Spam. I 
have been reviewing & checking off the spam as it 
arrives and then clicking on train. 

I just reviewed for new messages and none had 
arrived so I clicked on return Home and saw the 
messages copied below:
--------------------------------

 POP3 conversations this session: 1.
Emails classified this session: 0 spam, 0 ham, 0 unsure.
Total emails trained: Spam: 15 Ham: 211
More statistics...
Warning: you have much more ham than spam - 
SpamBayes works best with approximately even 
numbers of ham and spam.
------------------------------

the "warning" is what I am writing about. I read that 
there should be a more equal ratio of spam/ham but 
how are we to create that ratio when email continues 
to come in a skewered (in in my case a 15:211) ratio? I 
could unsubscribe from lists and then the spam would 
be more equal but obviously that's not practical.

I could review all emails & one by one & unclick the 
good ones and re-click them as "defer" thus creating a 
1:1 ratio of ham/spam for training but that would 
create a lot of messages to keep reviewing each time I 
& a whole lot of what would be a PITA amount of 
clicking as there's no global option to list everything 
as defer & then select the Spam & equal #'s of HAM 
for training purposes.

I could go to the Proxy folder and manually delete 
some of the stored items in the Ham cache but I 
suspect that would not alter the database's findings.

How is one to create the proper ham/spam ratio when 
the incoming Ham greatly outnumbers the Spam as 
regards training?

I must have missed something obvious.

Suggestions.


More information about the SpamBayes mailing list