[spambayes-dev] Training Question

Mon May 16 20:31:47 CEST 2005

In message:  <20050516164602.7AC521E400E at bag.python.org>
             "From Concept To Reality, L.L.C." <fctr at nac.net> writes:
>Greetings one and all:
>
>At what point is SPAMBayes sufficiently trained?

Spambayes is sufficiently trained when you are satisfied with its
performance. ;-)  Really, there is no absolute rule, particularly
since everyone's email is different.

[ settings snipped ]
>
>Using these settings, HAM have NEVER gone to UNSURE or SPAM,
>however, if I get 10 e-mails, with 1 as HAM, and 9 as SPAM, 3 SPAM
>end up in SPAM, 3 SPAM end up in UNSURE, and 3 SPAM end up in HAM.

First, spambayes tends to work better when trained with similar
amounts of spam and ham; you've currently got about a 4:1 ratio.
I'd suggest retraining with closer to a 1:1 ratio, and turning off
training while filtering (which will tend to drive you towards
severely unbalanced training).

Second, you may want to lower both your ham and spam thresholds;
if all your ham is being solidly classified as such, you may be
able to get by with a ham threshold of .1, or even .05.  Similarly,
you may be able to drop the spam threshold to .51 or lower, though
lower runs into the problem that a mail with only novel tokens
(scoring at .5, since spambayes doesn't know anything about it)
will end up in the spam bucket.

>What's going on, here? Do I need to adjust my settings more, or do
>I need to train more?

Oddly enough, you may need to train _less_, and preserve a better
training balance between spam and ham.

- Alex