[Spambayes] Training Question - Status configuration
Gary Smith
Gary at doctorgary.net
Mon May 8 19:28:49 CEST 2006
Question regarding the Ham vs Spam ratio
As I get many messages daily thanks to multiple active
lists I belong to, I do get far more Ham than Spam. I
have been reviewing & checking off the spam as it
arrives and then clicking on train.
I just reviewed for new messages and none had
arrived so I clicked on return Home and saw the
messages copied below:
--------------------------------
POP3 conversations this session: 1.
Emails classified this session: 0 spam, 0 ham, 0 unsure.
Total emails trained: Spam: 15 Ham: 211
More statistics...
Warning: you have much more ham than spam -
SpamBayes works best with approximately even
numbers of ham and spam.
------------------------------
the "warning" is what I am writing about. I read that
there should be a more equal ratio of spam/ham but
how are we to create that ratio when email continues
to come in a skewered (in in my case a 15:211) ratio? I
could unsubscribe from lists and then the spam would
be more equal but obviously that's not practical.
I could review all emails & one by one & unclick the
good ones and re-click them as "defer" thus creating a
1:1 ratio of ham/spam for training but that would
create a lot of messages to keep reviewing each time I
& a whole lot of what would be a PITA amount of
clicking as there's no global option to list everything
as defer & then select the Spam & equal #'s of HAM
for training purposes.
I could go to the Proxy folder and manually delete
some of the stored items in the Ham cache but I
suspect that would not alter the database's findings.
How is one to create the proper ham/spam ratio when
the incoming Ham greatly outnumbers the Spam as
regards training?
I must have missed something obvious.
Suggestions.
More information about the SpamBayes
mailing list