[Spambayes] SpamBayes Outlook Plug-In Not Performing Well

Sat Nov 22 17:57:44 EST 2003

[spambayes-bounces at python.org]
> ...
> I have been running the SpamBayes Outlook (2002 SP-2) plug-in for many
> weeks.  I have a training base of nearly 9000 spams and 800 or so
> hams.

Balance the # of ham and spam trained on and it should work much better.

> Despite this degree of training, the filter is only achieving
> about 60% success.  Many obvious spams receive a score of 0-2%.  The
> performance of the filter doesn't seem to be improving as the
> database grows, either.  I have retrained it from scratch several
> times, each time to no avail.
>
> I recognize that my ratio of spams to hams is very high;

So *try* balancing it.  Tell us what happens when you do.

> however, a different Bayesian filter I use at work has a similar
> ratio in its training base, and still achieves 98%+ effectiveness.

SpamBayes almost certainly uses very different algorithms than that one,
whatever it is.  Different strengths, different weaknesses, different
behaviors.

> What am I doing wrong?  What can I do to improve the percentage of
> spams that SpamBayes catches?

Train on less spam or more ham.  Also make sure you haven't misclassified
any messages you've trained on.  If the problem still persists, we'll need
to look at the info generated by "Show spam clues for current message".  But
with a training ratio unbalanced by more than 11 to 1, there's no point to
that exercise now -- unbalanced training data is a well-known cause for
flaky results in this program.