[spambayes-dev] Training Question
T. Alexander Popiel
popiel at wolfskeep.com
Mon May 16 20:31:47 CEST 2005
In message: <20050516164602.7AC521E400E at bag.python.org>
"From Concept To Reality, L.L.C." <fctr at nac.net> writes:
>Greetings one and all:
>
>At what point is SPAMBayes sufficiently trained?
Spambayes is sufficiently trained when you are satisfied with its
performance. ;-) Really, there is no absolute rule, particularly
since everyone's email is different.
[ settings snipped ]
>
>Using these settings, HAM have NEVER gone to UNSURE or SPAM,
>however, if I get 10 e-mails, with 1 as HAM, and 9 as SPAM, 3 SPAM
>end up in SPAM, 3 SPAM end up in UNSURE, and 3 SPAM end up in HAM.
First, spambayes tends to work better when trained with similar
amounts of spam and ham; you've currently got about a 4:1 ratio.
I'd suggest retraining with closer to a 1:1 ratio, and turning off
training while filtering (which will tend to drive you towards
severely unbalanced training).
Second, you may want to lower both your ham and spam thresholds;
if all your ham is being solidly classified as such, you may be
able to get by with a ham threshold of .1, or even .05. Similarly,
you may be able to drop the spam threshold to .51 or lower, though
lower runs into the problem that a mail with only novel tokens
(scoring at .5, since spambayes doesn't know anything about it)
will end up in the spam bucket.
>What's going on, here? Do I need to adjust my settings more, or do
>I need to train more?
Oddly enough, you may need to train _less_, and preserve a better
training balance between spam and ham.
- Alex
More information about the spambayes-dev
mailing list