[Spambayes] Re: Trained two times as much spam as ham

Mathew Hendry TJLWBECGSGWU at spammotel.com
Tue Jan 18 22:17:29 CET 2005


Rick Friedman <RickFriedman at vfemail.net> wrote in
<41ED64A2.4010809 at vfemail.net>:

>I was just wondering about the ratio of spam to ham trained.
>
>I've been training on errors & unsures. So far, I've trained 126 spams 
>and 51 hams. I keep hearing that we should strive to keep the training 
>ratio at about 1:1.
>
>Spambayes is working very well with the current training. I can't 
>remember the last time an email was misclassified. However, I do still 
>get man unsures which, inevitably, turn out to be spam. I then train 
>Spambayes on those unsures.
>
>Obviously, my concern is that Spambayes' effectiveness will diminish as 
>I continue to train more on more spam. The only time I seem to train as 
>ham is when a ham email shows as unsure (which is few & far between).
>
>Am I right to be concerned about this, apparent, continually growing, 
>imbalance in the training ratio? If so, what should I do about it?

I wouldn't worry too much about it, unless it becomes extremely unbalanced
(say 10:1) and/or you start seeing a lot of ham being scored as unsure. At
that point you should probably consider retraining from scratch. You might
also want to try rescoring your existing ham from time to time, to make sure
none of those have crept up into unsure/spam territory.

The spammers are doing their best to slip past every filter people can throw
at them, so spam tends to be less predictable and consistent than regular
mail that isn't trying to fool anybody. Naturally you'll get some unsures -
most of mine are "minimalist" spam that consists of only a word or two and a
randomized link; very little for SpamBayes to go on.

-- Mat.




More information about the Spambayes mailing list