Kenny Pitt kennypitt at hotmail.com
Wed Feb 11 08:57:36 EST 2004

Nowhere wrote: 
> I currently have 139 Good and 286 Spam trained. I get about 10x more
> spam than ham. I find that my ham is solidly classified at 0-1%
> while spambayes still misses some spam at numbers like 83% (and some
> at 0%).  These are the spam messages with lots of random words
> thrown in to try to defeat the statistical filters.
> Anyway it seems to me that with my HAM being recognized so perfectly
> while the spam is less than perfect that I would need to classify
> more spam, further deviating from the recommended 1:1 ratio.

You're right that your imbalance will tend to increase if you train only
on mistakes and unsures in this case.  Your current ratio is only about
2:1, though, which isn't bad compared to many reports we've seen.
Anecdotal evidence seems to indicate you're probably OK up until about
5:1 or so, and some have reported perfectly acceptable results with much
higher ratios.  It's not a bad idea to train on some extra ham now and
then to improve your balance, but try to look for the ones that score
furthest from a perfect 0.00 (even if it isn't by much).

If your ham is classifying consistently near 0% and you are missing some
spams around 80%, then you might want to try reducing your Certain Spam
threshold on the Filtering tab.  I have mine set to 60 currently,
although I wouldn't recommend going quite that low in general.  Around
70-75 should be fine, though.

