[Spambayes] Obvious Spam Missed...

Skip Montanaro skip at pobox.com
Tue Sep 16 16:31:10 EDT 2003

    >> or would it have to be something on the order of 10:1?

    Tim> The larger the worserer <wink>.  We've seen people with imbalance
    Tim> worse than 300:1 or 1:300 (different people get extreme in
    Tim> different ways).

In particular, it's almost certainly a mistake to initially train SpamBayes
on all the mailboxes on your computer which contain nothing but ham.  My
guess is that some people are doing this ("yup, that's ham <check>. yup,
that's ham <check>....), declaring all the mailboxes which are full of ham
to the Outlook plugin.  Most people don't save the spam they receive, so
doing that would cause them to immediately start off with a horrible
imbalance.  To make matters worse, if much of the ham saved is very old, it
may not accurately reflect what current ham looks like.

Since it's so easy to suck existing mailboxes into the Outlook plugin,
perhaps it should warn the user if the ham:spam ratio gets too out-of-whack.


