[Spambayes] Spambayes is great.

Skip Montanaro skip at pobox.com
Fri Jul 25 14:48:44 EDT 2003


    Lee> However, I do receive a number of false-positives ("unsure"),

Spambayes scores email messages on a continuum from 0.0 to 1.0.  The
distinction between ham, unsure and spam is simply where the ham_cutoff and
spam_cutoff values are placed.  A message being marked "unsure" only means
that it scored between the ham and spam cutoffs.  That doesn't mean it's a
false positive.  There are going to be messages which the system can't
easily categorize because they either contain too few clues or contain
roughly the same number of hammy and spammy clues.

You should definitely train on any messages which are scored "unsure" as
well as any messages which are incorrectly classified (spam which is scored
as ham or vice versa).  I also try to train on mail which is correctly
classified but close to the cutoff ("low spam" or "high ham").

    Lee> IMHO, whitelist functionality for sender addresses would make this
    Lee> solution complete.

Spambayes is building a whitelist for you, you just aren't aware of it. ;-)
As it sees more and more mail from your regular correspondents that's
trained as ham it will get better and better at classifying such mail.

In general, you need to give it a little time and feed it a fair amount of
mail.  I wouldn't worry about misclassifications until you've trained on at
least 100 messages.

Skip



More information about the Spambayes mailing list