[Spambayes] Thinking about changing SPAM cutoff; would like input

Kenny Pitt kennypitt at hotmail.com
Thu Jan 15 17:35:55 EST 2004


Jacob Farmer wrote:
> I'm running a database with about 1,200 entries, and roughly a 1:1
> spam/ham ratio.  Recently, I've been seeing a ton of unsure
> classifications.  Of about 35 junk mail messages today, 12 of them
> were classified as unsure.
> 
> [snip]
> 
> Is there a good stragety for trying to eliminate the unsures?

Spam is constantly mutating, so even a well-trained SpamBayes will get
some unsures for spam campaigns that it hasn't seen before.  The same is
also true for a ham message that is a little unusual compared to the
rest of your good messages.

The best strategy is probably to go ahead and train on the unsures.
Your balance doesn't have to remain at a perfect 1:1, just somewhere
close, and if it strays too far then you can pick a few properly
classified messages to add to the training just to balance it out again.

Since spam often comes in bursts of similar messages, you may find that
training on only 1 or 2 of the unsure spams will cause the rest to be
classified correctly as well.  You can check this by training on 1
message at a time, and then looking at the clues for the other messages
to see the updated score.

There aren't any training strategies that are known to be best for all
users and e-mail mixes.  There's a good bit of information about various
strategies here on our Wiki:

    <http://www.entrian.com/sbwiki/TrainingIdeas>

-- 
Kenny Pitt




More information about the Spambayes mailing list