[Spambayes] Marking message flagged as spam as non spam
thruska at cubiclesoft.com
Tue Apr 7 07:10:15 CEST 2009
skip at pobox.com wrote:
> Keith> How many hams and spams have you trained on?
> Keith> -Quite a few , around 350 spam mails, hams around 4500.
> This is way out-of-balance. Typically SpamBayes works best with roughly
> equal numbers of ham and spam.
While I agree that this is out of balance, Spambayes seriously needs to
get its act together and stop allowing users to train on imbalances or
messages classified correctly and allows users to reset the database
periodically (the POP3 proxy server seriously needs a feature that
allows you to do a complete reset of the database within the UI itself).
The rule of thumb I follow is: Train on only one spam in ham and one
ham in unsure. Skip training on messages I plan on filtering using my
e-mail client (i.e. no point in training on messages I'm going to
whitelist in the first place). Once I reach about 300 of each type,
reset the database and start over.
My problem is that 99.9% of my incoming mail is spam, so there is an
imbalance by default. I am forced to delete unsures because the
imbalance is so great. IMO, 'unsure' is an inappropriate word choice
for the category. It causes many users to feel they need to tell
Spambayes what is ham and spam. This, in turn, creates the imbalances
they then experience.
When was the last update to Spambayes? Time for a new version!
*NEW* MyTaskFocus 1.1
Get on task. Stay on task.
More information about the SpamBayes